Cybersecurity¶
Python is a popular tool among hackers and engineers. In this lab, you will learn Cryptology in cybersecurity, which covers
Cryptography: Encryption and decryption using a cipher.
Cryptanalysis: Devising an attack to break a cipher.
Caesar symmetric key cipher¶
We first implements a simple cipher called the Caesar cipher.
%%html
<iframe width="800" height="415" src="https://www.youtube.com/embed/sMOZf4GN3oc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
Encrypt/decrypt a character¶
How to encrypt a character?
The following code encrypts a character char using a non-negative integer key.
cc_n = 1114112
def cc_encrypt_character(char, key):
"""
Return the encryption of a character by an integer key using Caesar cipher.
Parameters
----------
char: str
a unicode (UTF-8) character to be encrypted.
key int:
secret key to encrypt char.
"""
char_code = ord(char)
shifted_char_code = (char_code + key) % cc_n
encrypted_char = chr(shifted_char_code)
return encrypted_char
For example, to encrypt the letter 'A' using a secret key 5:
cc_encrypt_character("A", 5)
'F'
The character 'A' is encrypted to the character 'F' as follows:
ord(char)return the integer65that is the code point (integer representation) of the unicode of'A'.(char_code + key) % cc_ncyclic shifts the code by the key5.chr(shifted_char_code)converts the shifted code back to a character, which is'F'.
Encryption |
||||||||
|---|---|---|---|---|---|---|---|---|
|
… |
A |
B |
C |
D |
E |
F |
… |
|
… |
65 |
66 |
67 |
68 |
69 |
70 |
… |
|
… |
70 |
71 |
72 |
73 |
74 |
75 |
… |
|
… |
F |
G |
H |
I |
J |
K |
… |
You may learn more about ord and chr from their docstrings:
help(ord)
help(chr)
Help on built-in function ord in module builtins:
ord(c, /)
Return the Unicode code point for a one-character string.
Help on built-in function chr in module builtins:
chr(i, /)
Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.
How to decrypt a character?
Mathematically, we define the encryption and decryption of a character for Caesar cipher as
where \(x\) is the character code in \(\{0,\dots,n\}\) and \(k\) is the secret key. mod operator above is the modulo operator. In Mathematics, it has a lower precedence than addition and multiplication and is typeset with an extra space accordingly.
The encryption and decryption satisfies the recoverability condition
so two people with a common secret key can encrypt and decrypt a character, but others not knowing the key cannot. This is a defining property of a symmetric cipher.
The following code decrypts a character using a key.
def cc_decrypt_character(char, key):
"""
Return the decryption of a character by the key using Caesar cipher.
Parameters
----------
char: str
a unicode (UTF-8) character to be decrypted.
key: int
secret key to decrypt char.
"""
char_code = ord(char)
shifted_char_code = (char_code - key) % cc_n
decrypted_char = chr(shifted_char_code)
return decrypted_char
For instance, to decrypt the letter 'F' by the secret key 5:
cc_decrypt_character("F", 5)
'A'
The character 'F' is decrypted back to 'A' because
(char_code - key) % cc_n reverse cyclic shifts the code by the key 5.
Encryption |
Decryption |
||||||||
|---|---|---|---|---|---|---|---|---|---|
|
… |
A |
B |
C |
D |
E |
F |
… |
|
|
… |
65 |
66 |
67 |
68 |
69 |
70 |
… |
|
|
… |
70 |
71 |
72 |
73 |
74 |
75 |
… |
|
|
… |
F |
G |
H |
I |
J |
K |
… |
|
Exercise Why did we set cc_n = 1114112? Explain whether the recoverability property may fail if we set cc_n to a bigger number or remove % cc_n for both cc_encrypt_character and cc_decrypt_character.
YOUR ANSWER HERE
Encrypt a plaintext and decrypt a ciphertext¶
Of course, it is more interesting to encrypt a string instead of a character. The following code implements this in one line.
def cc_encrypt(plaintext, key):
"""
Return the ciphertext of a plaintext by the key using Caesar cipher.
Parameters
----------
plaintext: str
A unicode (UTF-8) message to be encrypted.
public_key: int
Public key to encrypt plaintext.
"""
return "".join([chr((ord(char) + key) % cc_n) for char in plaintext])
The above function encrypts a message, referred to as the plaintext, by replacing each character with its encryption.
This is referred to as a substitution cipher.
Exercise Define a function cc_decrypt that
takes a string
ciphertextand an integerkey, andreturns the plaintext that encrypts to
ciphertextby the key using Caesar cipher.
def cc_decrypt(ciphertext, key):
"""
Return the plaintext that encrypts to ciphertext by the key using Caesar cipher.
Parameters
----------
ciphertext: str
message to be decrypted.
key: int
secret key to decrypt the ciphertext.
"""
# YOUR CODE HERE
raise NotImplementedError()
# tests
assert cc_decrypt(r"bcdefghijklmnopqrstuvwxyz{", 1) == "abcdefghijklmnopqrstuvwxyz"
assert cc_decrypt(r"Mjqqt1%\twqi&", 5) == "Hello, World!"
# hidden tests
Brute-force attack¶
Create an English dictionary¶
You will launch a brute-force attack to guess the key that encrypts an English text. The idea is simple:
You try decrypting the ciphertext with different keys, and
see which of the resulting plaintexts make most sense (most english-like).
To check whether a plaintext is English-like, we need to have a list of English words. One way is to type them out but this is tedious. Alternatively, we can obtain the list from the Natural Language Toolkit (NLTK):
import nltk
nltk.download("words")
from nltk.corpus import words
[nltk_data] Downloading package words to /home/cs1302/nltk_data...
[nltk_data] Package words is already up-to-date!
words.words() returns a list of words. We can check whether a string is in the list using the operator in.
for word in "Ada", "ada", "Hello", "hello":
print("{!r} in dictionary? {}".format(word, word in words.words()))
'Ada' in dictionary? True
'ada' in dictionary? False
'Hello' in dictionary? False
'hello' in dictionary? True
However there are two issues:
Checking membership is slow for a long list.
Both ‘Hello’ and ‘ada’ are English-like but they are not in the words_list.
Exercise Using the method lower of str and the constructor set, assign dictionary to a set of lowercase English words from words.words().
# YOUR CODE HERE
raise NotImplementedError()
# tests
assert isinstance(dictionary, set) and len(dictionary) == 234377
assert all(word in dictionary for word in ("ada", "hello"))
assert all(word not in dictionary for word in ("Ada", "hola"))
# hidden tests
### BEGIN TESTS
assert "world" in dictionary
assert not "mundo" in dictionary
### END TESTS
Identify English-like text¶
To determine how English-like a text is, we calculate the following score:
where tokens are substrings (not necessarily an English word) separated by white space characters in the text.
def tokenizer(text):
"""Returns the list of tokens of the text."""
return text.split()
def get_score(text):
"""Returns the fraction of tokens which appear in dictionary."""
tokens = tokenizer(text)
words = [token for token in tokens if token in dictionary]
return len(words) / len(tokens)
# tests
get_score("hello world"), get_score("Hello, World!")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-16-e9ed45c44ae3> in <module>
12
13 # tests
---> 14 get_score("hello world"), get_score("Hello, World!")
<ipython-input-16-e9ed45c44ae3> in get_score(text)
7 """Returns the fraction of tokens which appear in dictionary."""
8 tokens = tokenizer(text)
----> 9 words = [token for token in tokens if token in dictionary]
10 return len(words) / len(tokens)
11
<ipython-input-16-e9ed45c44ae3> in <listcomp>(.0)
7 """Returns the fraction of tokens which appear in dictionary."""
8 tokens = tokenizer(text)
----> 9 words = [token for token in tokens if token in dictionary]
10 return len(words) / len(tokens)
11
NameError: name 'dictionary' is not defined
As shown in tests above, the code fails to handle text with punctuations and uppercase letters properly.
In particular,
while
get_scorerecognizeshello worldas English-like and returns the maximum score of 1,it fails to recognize
Hello, World!as English-like and returns the minimum score of 0.
Why? This is because every words in dictionary
are in lowercase, and
have no leading/trailing punctuations.
Exercise Define a funtion tokenizer that
takes a string
textas an argument, andreturns a
listof tokens obtained bysplitting
textinto a list usingsplit();removing leading/trailing punctuations in
string.punctuationusing thestripmethod; andconverting all items of the list to lowercase using
lower().
import string
def tokenizer(text):
"""Returns the list of tokens of the text such that
1) each token has no leading or training spaces/punctuations, and
2) all letters in each tokens are in lowercase."""
# YOUR CODE HERE
raise NotImplementedError()
# tests
assert tokenizer("Hello, World!") == ["hello", "world"]
assert get_score("Hello, World!") >= 0.99999
assert tokenizer("Do you know Jean-Pierre?") == ["do", "you", "know", "jean-pierre"]
assert get_score("Do you know Jean-Pierre?") >= 0.99999
# hidden tests
Launch a brute-force attack¶
Exercise Define the function cc_attack that
takes as arguments
a string
ciphertext,a floating point number
thresholdin the interval \((0,1)\) with a default value of \(0.6\), and
returns a generator that
generates one-by-one in ascending order guesses of the key that
decrypt
ciphertextto texts with scores at least thethreshold.
def cc_attack(ciphertext, threshold=0.6):
"""Returns a generator that generates the next guess of the key that
decrypts the ciphertext to a text with get_score(text) at least the threshold.
"""
# YOUR CODE HERE
raise NotImplementedError()
# tests
ciphertext = cc_encrypt("Hello, World!", 12345)
key_generator = cc_attack(ciphertext)
key_guess = next(key_generator)
assert key_guess == 12345
text = cc_decrypt(ciphertext, key_guess)
print(
"guess of the key: {}\nscore: {}\ntext :{}".format(key_guess, get_score(text), text)
)
# hidden tests
Challenge¶
Another symmetric key cipher is columnar transposition cipher. A transposition cipher encrypts a text by permuting instead of substituting characters.
Exercise Study and implement the irregular case of the columnar transposition cipher as described in Wikipedia page. Define the functions
ct_encrypt(plaintext, key)for encryption, andct_decrypt(ciphertext, key)for decryption.
You can assume the plaintext is in uppercase and has no spaces/punctuations.
Hints: See the text cases for an example of plaintext, key, and the corresponding ciphertext. You can but are not required to follow the solution template below:
def argsort(seq):
'''A helper function that returns the tuple of indices that would sort the
sequence seq.'''
return tuple(x[0] for x in sorted(enumerate(seq), key=lambda x: x[1]))
def ct_idx(length, key):
'''A helper function that returns the tuple of indices that would permute
the letters of a message according to the key using the irregular case of
columnar transposition cipher.'''
seq = tuple(range(length))
return [i for j in argsort(key) for i in _______________]
def ct_encrypt(plaintext, key):
"""
Return the ciphertext of a plaintext by the key using the irregular case
of columnar transposition cipher.
Parameters
----------
plaintext: str
a message in uppercase without punctuations/spaces.
key: str
secret key to encrypt plaintext.
"""
return ''.join([plaintext[i] for i in ct_idx(len(plaintext), key)])
def ct_decrypt(ciphertext, key):
"""
Return the plaintext of the ciphertext by the key using the irregular case
of columnar transposition cipher.
Parameters
----------
ciphertext: str
a string in uppercase without punctuations/spaces.
key: str
secret key to decrypt ciphertext.
"""
return _______________________________________________________________________
# YOUR CODE HERE
raise NotImplementedError()
# tests
key = "ZEBRAS"
plaintext = "WEAREDISCOVEREDFLEEATONCE"
ciphertext = "EVLNACDTESEAROFODEECWIREE"
assert ct_encrypt(plaintext, key) == ciphertext
assert ct_decrypt(ciphertext, key) == plaintext
# hidden tests