A Comprehensive Deep Dive to Caesar's Cipher

Feb 23, 2024

Until now, I've only posted DeFi-related blog posts; however, from now on, I want to expand the scope of the content I create. Welcome to my first cryptography post. In this writing, we will discuss some aspects and attacks on a cryptographic scheme based on the Caesar's cipher (or shift cipher). Note that, to prevent confusion, we use small letters for the plaintext and capital ones for the ciphertext. Here is the content list:

Understanding the Caesar Cipher

Definition and Historical Significance
The Basic Mechanics of the Cipher
An Example of Caesar Cipher Encryption

The Vulnerability of Caesar's Cipher

The Concept of Brute Force Attacks
Python Code for Brute Force Decryption
Python Code for Frequency Analysis to Automate Attacks

Enhancing Caesar's Cipher: Mono-Alphabetic Substitution

The Idea of Using Permutations as Keys
Python Code for Mono-Alphabetic Substitution Encryption
The Feasibility of Brute Force Attacks
Statistical Methods in Cryptanalysis
Reference to Further Cryptanalysis Example

Cryptography was predominantly used in the military because it was crucial to share confidential information in a way that enemies could not decipher. To encrypt his messages, Julius Caesar shifted each letter of his message by 3 places. For example, he would write "D" instead of "A" and "E" instead of "B," and so on. This is one of the simplest examples of symmetric encryption, where the same key is used for both encryption and decryption; in this case, the key is 3.

"ZHE WKUHH LV D IXWXUH"

is the encryption of the sentence 'web three is a future' by Caesar's cipher with the key 3. Note that using spaces in the plaintext may leak some information, for instance when an attacker sees the letter "D" standing alone in the sentence he can easily guess that D represents "A".

Formally, Caesar’s cipher (private key encryption in general) consists of three algorithms. The first one generates keys, the second one encrypts plaintexts. We denote the encryption of the plaintext m using the key k by

\(Enc_k(m)\)

The last one is decryption which takes ciphertext and generates the corresponding plaintext. It’s denoted by

\(Dec_k(c)\)

In order to be consistent, we should have

\(Dec_k(Enc_k(m))=m\)

for every message m and key k.

Thus, methodically we say that the Caesar’s schema is

\(Enc_3(m_i) = m_i+3\)

and

\(Dec_3(c_i) = c_i-3\)

for all m_i is a char of the message and c_i is a char of the ciphertext.

Let's write a Python code that encrypts messages via a randomly generated key of Caesar's cipher. Note: you can reach all python codes we mentioned in this post here https://github.com/arbnom/Caesar-s-Cipher

import random

def caesar_cipher_encrypt(plaintext, key):
    encrypted_text = ""
    for char in plaintext:
        # Convert to lowercase if it's not
        char = char.lower()
        # Check if the character is a lowercase letter
        if char.islower():
            encrypted_char = chr((ord(char) - 97 + key) % 26 + 97)
            encrypted_text += encrypted_char.upper()  # Convert to uppercase
        else:
            # Keep the character unchanged if it's not a letter
            encrypted_text += char.upper()
    return encrypted_text

# Choose a random key between 1 and 25
key = random.randint(1, 25)
print(f"Key: {key}")

# Set the plaintext directly
plaintext = "web three is a future"

# Encrypt the plaintext
encrypted_text = caesar_cipher_encrypt(plaintext, key)

# Print the encrypted text
print(f"Encrypted Text: {encrypted_text}")

That code snippet generates a random key between 1-25 and encrypts the given plaintext according to that key. It does not change the non-alphabetic chars so typing them gives better results.

While Caesar's cipher is an engaging introduction to encryption, its simplicity makes it vulnerable to brute force attacks. In such attacks, every possible key is tested to decrypt a ciphertext. Given the 26 letters in the English alphabet, finding the correct key is straightforward. Below is a Python code snippet demonstrating a brute force attack on Caesar's cipher.

def caesar_cipher_decrypt(ciphertext, key):
    decrypted_text = ""
    for char in ciphertext:
        # Check if the character is an uppercase letter
        if 'A' <= char <= 'Z':
            # Perform decryption only on uppercase letters
            decrypted_char = chr((ord(char) - 65 - key) % 26 + 65)
            decrypted_text += decrypted_char.lower()  # Convert decrypted letters to lowercase for consistency
        else:
            # Leave non-letter characters unchanged
            decrypted_text += char
    return decrypted_text

# Ciphertext input is provided directly in the code
ciphertext = "ZHE WKUHH LV D IXWXUH"

# Try all possible keys from 1 to 25
for i in range(1, 26):
    # Decrypt the ciphertext using the current key
    decrypted_text = caesar_cipher_decrypt(ciphertext, i)
    # Print the decrypted text
    print(f"Key {i}: {decrypted_text}")

The Python code snippet provided efficiently demonstrates this by iterating through every potential key from 1 to 25 and decrypting the ciphertext "ZHE WKUHH LV D IXWXUH". The outputs for each key are as follows:

Key 1: ygd vjtgg ku c hwvwtg

Key 2: xfc uisff jt b gvuvsf

Key 3: web three is a future (revealing the original plaintext)

Key 4: vda sgqdd hr z etstqd

Key 5: ucz rfpcc gq y dsrspc

Key 6: tby qeobb fp x crqrob

Key 7: sax pdnaa eo w bqpqna

Key 8: rzw ocmzz dn v apopmz

Key 9: qyv nblyy cm u zonoly

Key 10: pxu makxx bl t ynmnkx

Key 11: owt lzjww ak s xmlmjw

Key 12: nvs kyivv zj r wlkliv

Key 13: mur jxhuu yi q vkjkhu

Key 14: ltq iwgtt xh p ujijgt

Key 15: ksp hvfss wg o tihifs

Key 16: jro guerr vf n shgher

Key 17: iqn ftdqq ue m rgfgdq

Key 18: hpm escpp td l qfefcp

Key 19: gol drboo sc k pedebo

Key 20: fnk cqann rb j odcdan

Key 21: emj bpzmm qa i ncbczm

Key 22: dli aoyll pz h mbabyl

Key 23: ckh znxkk oy g lazaxk

Key 24: bjg ymwjj nx f kzyzwj

Key 25: aif xlvii mw e jyxyvi

This exhaustive list clearly illustrates how each key transforms the ciphertext into a different plaintext, ultimately uncovering the original message with Key 3. This example underscores the critical weakness of Caesar's cipher against brute force attacks, highlighting the necessity for more sophisticated encryption methods to ensure secure communication.

While brute force attacks on Caesar's cipher test every possible key, such an approach isn't easily automated, especially if the message has been intentionally obfuscated (e.g., by substituting "q" for "p"). To address this, we can employ frequency analysis, leveraging the known frequencies of letters in the English language (denoted as p_i) to decipher the ciphertext.

For instance, the letter "e" (denoted as p_5) appears in 12.7% of cases in English texts. By analyzing the frequency of each letter in the ciphertext and comparing it to these known values, we can estimate the key used for encryption. The sum of the squares of the frequencies of letters in English is approximately 0.065, a figure we'll use to guide our decryption.

The following Python code performs frequency analysis on a given ciphertext, "ZHE WKUHH LV D IXWXUH", to estimate the encryption key:

def calculate_frequency(text):
    frequency = {chr(i + 97): 0 for i in range(26)}  # Initialize frequency dictionary for 'a' to 'z'
    for char in text.lower():  # Convert text to lowercase to standardize
        if char in frequency:
            frequency[char] += 1
    total = sum(frequency.values())
    if total > 0:
        frequency = {char: count / total for char, count in frequency.items()}  # Normalize frequencies
    return frequency

# Frequencies of letters in the English alphabet (p_i values)
english_freq = {
    'a': 0.08167, 'b': 0.01492, 'c': 0.02782, 'd': 0.04253, 'e': 0.12702,
    'f': 0.02228, 'g': 0.02015, 'h': 0.06094, 'i': 0.06966, 'j': 0.00153,
    'k': 0.00772, 'l': 0.04025, 'm': 0.02406, 'n': 0.06749, 'o': 0.07507,
    'p': 0.01929, 'q': 0.00095, 'r': 0.05987, 's': 0.06327, 't': 0.09056,
    'u': 0.02758, 'v': 0.00978, 'w': 0.02360, 'x': 0.00150, 'y': 0.01974,
    'z': 0.00074
}

# Ciphertext input
ciphertext = "ZHE WKUHH LV D IXWXUH".replace(" ", "").upper()  # Remove spaces and convert to uppercase for analysis

# List frequencies for letters in ciphertext (q_i values)
ciphertext_freq = calculate_frequency(ciphertext)

# Create a dictionary to hold I_j values
I_j_values = {}

# Calculate I_j for each possible shift j
for j in range(26):
    I_j = 0
    for i in range(26):
        char_from_english = chr((i + 97) % 126)  # Ensure we're using lowercase letters
        char_from_ciphertext = chr((i + j + 97) % 126)
        if char_from_english in english_freq and char_from_ciphertext in ciphertext_freq:
            I_j += english_freq[char_from_english] * ciphertext_freq[char_from_ciphertext]
    I_j_values[j] = I_j

# Find the key by identifying the shift with I_j value nearest to 0.065
nearest_I_j_value = min(I_j_values.values(), key=lambda x: abs(x - 0.065))
key = [k for k, v in I_j_values.items() if v == nearest_I_j_value][0]

print(f"Estimated Key: {key}")

This code first calculates the frequency of each letter in the ciphertext, then compares these frequencies against those expected in English text. The key is estimated by finding the shift that brings the product of the English and ciphertext letter frequencies closest to 0.065.

Additionally, the provided frequency_analysis_debug.py file in the repository enables a deeper dive, showing the I_j - 00.65 values for each potential key j. This can illustrate how even with short ciphertexts like "ZHE WKUHH LV D IXWXUH", the correct key results in a value closest to 0.065, aiding in the decryption process without needing an extensive ciphertext. Here is the values:

I_0 - 0.065 = 0.02695

I_1 - 0.065 = 0.03463

I_2 - 0.065 = 0.03462

I_3 - 0.065 = 0.00561

I_4 - 0.065 = 0.01560

I_5 - 0.065 = 0.03382

I_6 - 0.065 = 0.02737

I_7 - 0.065 = 0.01500

I_8 - 0.065 = 0.03250

I_9 - 0.065 = 0.03262

I_10 - 0.065 = 0.03567

I_11 - 0.065 = 0.03285

I_12 - 0.065 = 0.03496

I_13 - 0.065 = 0.03934

I_14 - 0.065 = 0.02270

I_15 - 0.065 = 0.01765

I_16 - 0.065 = 0.01072

I_17 - 0.065 = 0.02736

I_18 - 0.065 = 0.02065

I_19 - 0.065 = 0.01161

I_20 - 0.065 = 0.02051

I_21 - 0.065 = 0.03260

I_22 - 0.065 = 0.02465

I_23 - 0.065 = 0.03499

I_24 - 0.065 = 0.05015

I_25 - 0.065 = 0.02611

The Caesar cipher, while simple and historically significant, is easily compromised due to the limited number of possible keys. To enhance the security, one could consider a mono-alphabetic substitution cipher, which uses permutations of the alphabet as keys. For the 26 letters of the English alphabet, this results in a vast key space of 26! (26 factorial) permutations, making brute force attacks impractical due to the sheer number of potential combinations.

Here's a Python code snippet that implements the mono-alphabetic substitution cipher by generating a random key and using it for encryption:

import random

# Generate a random permutation of the alphabet
key = random.sample(range(26), 26)
print(f"Key: {key}")

def encrypt_with_monoalphabetic_cipher(plaintext, key):
    encrypted_text = ""
    for char in plaintext.lower():  # Ensure plaintext is in lowercase
        if char.isalpha():  # Check if the character is a letter
            # Encrypt the letter using the generated key
            encrypted_char = chr(key[ord(char) - ord('a')] + ord('A'))  # Convert to uppercase
            encrypted_text += encrypted_char
        else:
            # Non-letter characters are kept as is
            encrypted_text += char
    return encrypted_text

# Example plaintext
plaintext = "web three is a future"

# Encrypt the plaintext
encrypted_text = encrypt_with_monoalphabetic_cipher(plaintext, key)

# Print the encrypted text
print(f"Encrypted Text: {encrypted_text}")

This code will encrypt a given plaintext using a randomly generated permutation of the alphabet as the key. The resulting ciphertext will appear as a series of uppercase letters, with non-letter characters left unchanged.

Despite the significantly larger key space, the mono-alphabetic substitution cipher is not immune to attacks. Cryptanalysts can employ statistical methods to uncover patterns within the ciphertext. For instance, in English:

The letter "e" is the most frequent.

The letter "q" is typically followed by "u".

The letter "h" commonly appears between "t" and "e".

These patterns can be exploited to decipher the ciphertext without the need for brute force, particularly if a substantial amount of text is available for analysis.

For a comprehensive example of cryptanalysis applied to a mono-alphabetic encryption, check out the detailed explanation provided at the link: About Cryptanalysis of Monoalphabetic Encryption.

This introduction to Caesar's cipher is just the beginning. In future posts, I plan to delve deeper into the fascinating world of cryptography. Thank you for reading and don’t forget to subscribe my Substack.

Arbnom’s Substack

A Comprehensive Deep Dive to Caesar's Cipher