{
"cells": [
{
"cell_type": "markdown",
"id": "92ad4626",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Cybersecurity"
]
},
{
"cell_type": "markdown",
"id": "0152ed3a",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"remove-cell"
]
},
"source": [
"**CS1302 Introduction to Computer Programming**\n",
"___"
]
},
{
"cell_type": "markdown",
"id": "bc221fee",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Python is a popular tool among hackers and engineers. In this lab, you will learn Cryptology in cybersecurity, which covers\n",
"- [Cryptography](https://en.wikipedia.org/wiki/Cryptography): Encryption and decryption using a cipher.\n",
"- [Cryptanalysis](https://en.wikipedia.org/wiki/Cryptanalysis): Devising an attack to break a cipher."
]
},
{
"cell_type": "markdown",
"id": "8165adbc",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Caesar symmetric key cipher"
]
},
{
"cell_type": "markdown",
"id": "38edbdd7",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We first implements a simple cipher called the [Caesar cipher](https://en.wikipedia.org/wiki/Caesar_cipher)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "377a2a79",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%html\n",
""
]
},
{
"cell_type": "markdown",
"id": "e89a9d2d",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Encrypt/decrypt a character"
]
},
{
"cell_type": "markdown",
"id": "71108895",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**How to encrypt a character?**"
]
},
{
"cell_type": "markdown",
"id": "0dc957d7",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The following code encrypts a character `char` using a non-negative integer `key`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5b11fee2",
"metadata": {
"code_folding": [],
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"cc_n = 1114112\n",
"\n",
"\n",
"def cc_encrypt_character(char, key):\n",
" \"\"\"\n",
" Return the encryption of a character by an integer key using Caesar cipher.\n",
"\n",
" Parameters\n",
" ----------\n",
" char: str \n",
" a unicode (UTF-8) character to be encrypted.\n",
" key int:\n",
" secret key to encrypt char.\n",
" \"\"\"\n",
" char_code = ord(char)\n",
" shifted_char_code = (char_code + key) % cc_n\n",
" encrypted_char = chr(shifted_char_code)\n",
" return encrypted_char"
]
},
{
"cell_type": "markdown",
"id": "058ffe8e",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For example, to encrypt the letter `'A'` using a secret key `5`:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f924a318",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'F'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cc_encrypt_character(\"A\", 5)"
]
},
{
"cell_type": "markdown",
"id": "5f3a93e2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The character `'A'` is encrypted to the character `'F'` as follows:\n",
"\n",
"1. `ord(char)` return the integer `65` that is the code point (integer representation) of the unicode of `'A'`. \n",
"2. `(char_code + key) % cc_n` cyclic shifts the code by the key `5`.\n",
"3. `chr(shifted_char_code)` converts the shifted code back to a character, which is `'F'`.\n",
"\n",
"| Encryption | | | | | | | | |\n",
"| ------------------------------- | --- | ----- | --- | --- | --- | --- | --- | --- |\n",
"| `char` | ... | **A** | B | C | D | E | F | ... |\n",
"| `ord(char)` | ... | **65**| 66 | 67 | 68 | 69 | 70 | ... |\n",
"| `(ord(char) + key) % cc_n` | ... | **70**| 71 | 72 | 73 | 74 | 75 | ... |\n",
"| `(chr(ord(char) + key) % cc_n)` | ... | **F** | G | H | I | J | K | ... |"
]
},
{
"cell_type": "markdown",
"id": "a1b090db",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"You may learn more about `ord` and `chr` from their docstrings:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "edb8d679",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on built-in function ord in module builtins:\n",
"\n",
"ord(c, /)\n",
" Return the Unicode code point for a one-character string.\n",
"\n",
"Help on built-in function chr in module builtins:\n",
"\n",
"chr(i, /)\n",
" Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.\n",
"\n"
]
}
],
"source": [
"help(ord)\n",
"help(chr)"
]
},
{
"cell_type": "markdown",
"id": "93a54e32",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**How to decrypt a character?**"
]
},
{
"cell_type": "markdown",
"id": "4162d16c",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Mathematically, we define the encryption and decryption of a character for Caesar cipher as\n",
"\n",
"$$ \\begin{aligned} E(x,k) &:= x + k \\mod n & \\text{(encryption)} \\\\\n",
"D(x,k) &:= x - k \\mod n & \\text{(decryption),} \\end{aligned}\n",
"$$\n",
"where $x$ is the character code in $\\{0,\\dots,n\\}$ and $k$ is the secret key. `mod` operator above is the modulo operator. In Mathematics, it has a lower precedence than addition and multiplication and is typeset with an extra space accordingly."
]
},
{
"cell_type": "markdown",
"id": "c1682dea",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The encryption and decryption satisfies the recoverability condition\n",
"\n",
"$$ D(E(x,k),k) = x $$\n",
"so two people with a common secret key can encrypt and decrypt a character, but others not knowing the key cannot. This is a defining property of a [symmetric cipher](https://en.wikipedia.org/wiki/Symmetric-key_algorithm)."
]
},
{
"cell_type": "markdown",
"id": "4467db5b",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The following code decrypts a character using a key."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0fc73c5e",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"def cc_decrypt_character(char, key):\n",
" \"\"\"\n",
" Return the decryption of a character by the key using Caesar cipher.\n",
"\n",
" Parameters\n",
" ----------\n",
" char: str\n",
" a unicode (UTF-8) character to be decrypted.\n",
" key: int\n",
" secret key to decrypt char.\n",
" \"\"\"\n",
" char_code = ord(char)\n",
" shifted_char_code = (char_code - key) % cc_n\n",
" decrypted_char = chr(shifted_char_code)\n",
" return decrypted_char"
]
},
{
"cell_type": "markdown",
"id": "7bb24ea6",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For instance, to decrypt the letter `'F'` by the secret key `5`:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a0a033c7",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'A'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cc_decrypt_character(\"F\", 5)"
]
},
{
"cell_type": "markdown",
"id": "8a014cb2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The character `'F'` is decrypted back to `'A'` because\n",
"`(char_code - key) % cc_n` reverse cyclic shifts the code by the key `5`.\n",
"\n",
"| Encryption | | | | | | | | | Decryption |\n",
"| ------------------------------- | --- | ----- | --- | --- | --- | --- | --- | --- | ------------------------------- |\n",
"| `char` | ... | **A** | B | C | D | E | F | ... | `(chr(ord(char) - key) % cc_n)` |\n",
"| `ord(char)` | ... | **65**| 66 | 67 | 68 | 69 | 70 | ... | `(ord(char) - key) % cc_n` |\n",
"| `(ord(char) + key) % cc_n` | ... | **70**| 71 | 72 | 73 | 74 | 75 | ... | `ord(char)` |\n",
"| `(chr(ord(char) + key) % cc_n)` | ... | **F** | G | H | I | J | K | ... | `char` |"
]
},
{
"cell_type": "markdown",
"id": "493d21a1",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Exercise** Why did we set `cc_n = 1114112`? Explain whether the recoverability property may fail if we set `cc_n` to a bigger number or remove `% cc_n` for both `cc_encrypt_character` and `cc_decrypt_character`."
]
},
{
"cell_type": "markdown",
"id": "93393d49",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "e73c497e70f1a86fe0ead6483b3a7d17",
"grade": true,
"grade_id": "modulo",
"locked": false,
"points": 1,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "-"
}
},
"source": [
"YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"id": "135a8d43",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Encrypt a plaintext and decrypt a ciphertext"
]
},
{
"cell_type": "markdown",
"id": "dfdd7ca2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Of course, it is more interesting to encrypt a string instead of a character. The following code implements this in one line."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "69eccae8",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"def cc_encrypt(plaintext, key):\n",
" \"\"\"\n",
" Return the ciphertext of a plaintext by the key using Caesar cipher.\n",
"\n",
" Parameters\n",
" ----------\n",
" plaintext: str\n",
" A unicode (UTF-8) message to be encrypted.\n",
" public_key: int\n",
" Public key to encrypt plaintext.\n",
" \"\"\"\n",
" return \"\".join([chr((ord(char) + key) % cc_n) for char in plaintext])"
]
},
{
"cell_type": "markdown",
"id": "3eb50a3e",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The above function encrypts a message, referred to as the *plaintext*, by replacing each character with its encryption. \n",
"This is referred to as a [*substitution cipher*](https://en.wikipedia.org/wiki/Substitution_cipher)."
]
},
{
"cell_type": "markdown",
"id": "8eeb9384",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Exercise** Define a function `cc_decrypt` that\n",
"- takes a string `ciphertext` and an integer `key`, and\n",
"- returns the plaintext that encrypts to `ciphertext` by the key using Caesar cipher."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "07a0b200",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "99cece2794f8673cbc4b9862ba6f38a4",
"grade": false,
"grade_id": "cc_decrypt",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"remove-output"
]
},
"outputs": [],
"source": [
"def cc_decrypt(ciphertext, key):\n",
" \"\"\"\n",
" Return the plaintext that encrypts to ciphertext by the key using Caesar cipher.\n",
"\n",
" Parameters\n",
" ----------\n",
" ciphertext: str\n",
" message to be decrypted.\n",
" key: int\n",
" secret key to decrypt the ciphertext.\n",
" \"\"\"\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f61fc2e9",
"metadata": {
"code_folding": [],
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "94e426a0a59da6e0e8d1b8ff16f8eae5",
"grade": true,
"grade_id": "test-cc_decrypt",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [
{
"ename": "NotImplementedError",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mcc_decrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mr\"bcdefghijklmnopqrstuvwxyz{\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"abcdefghijklmnopqrstuvwxyz\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mcc_decrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mr\"Mjqqt1%\\twqi&\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mcc_decrypt\u001b[0;34m(ciphertext, key)\u001b[0m\n\u001b[1;32m 11\u001b[0m \"\"\"\n\u001b[1;32m 12\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m: "
]
}
],
"source": [
"# tests\n",
"assert cc_decrypt(r\"bcdefghijklmnopqrstuvwxyz{\", 1) == \"abcdefghijklmnopqrstuvwxyz\"\n",
"assert cc_decrypt(r\"Mjqqt1%\\twqi&\", 5) == \"Hello, World!\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "fcf36815",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "5d06619428a17c86edc6f46896203a67",
"grade": true,
"grade_id": "h_test-cc_decrypt",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [],
"source": [
"# hidden tests"
]
},
{
"cell_type": "markdown",
"id": "b1366ee3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Brute-force attack"
]
},
{
"cell_type": "markdown",
"id": "cdae8437",
"metadata": {},
"source": [
"### Create an English dictionary"
]
},
{
"cell_type": "markdown",
"id": "b509fcda",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"You will launch a brute-force attack to guess the key that encrypts an English text. The idea is simple: \n",
"\n",
"- You try decrypting the ciphertext with different keys, and \n",
"- see which of the resulting plaintexts make most sense (most english-like)."
]
},
{
"cell_type": "markdown",
"id": "c003f7fb",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"To check whether a plaintext is English-like, we need to have a list of English words. One way is to type them out\n",
"but this is tedious. Alternatively, we can obtain the list from the *Natural Language Toolkit (NLTK)*:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "097de725",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package words to /home/cs1302/nltk_data...\n",
"[nltk_data] Package words is already up-to-date!\n"
]
}
],
"source": [
"import nltk\n",
"\n",
"nltk.download(\"words\")\n",
"from nltk.corpus import words"
]
},
{
"cell_type": "markdown",
"id": "2732a10b",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"`words.words()` returns a list of words. We can check whether a string is in the list using the operator `in`."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "825eb290",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'Ada' in dictionary? True\n",
"'ada' in dictionary? False\n",
"'Hello' in dictionary? False\n",
"'hello' in dictionary? True\n"
]
}
],
"source": [
"for word in \"Ada\", \"ada\", \"Hello\", \"hello\":\n",
" print(\"{!r} in dictionary? {}\".format(word, word in words.words()))"
]
},
{
"cell_type": "markdown",
"id": "cebc98d4",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"However there are two issues:\n",
"- Checking membership is slow for a long list.\n",
"- Both 'Hello' and 'ada' are English-like but they are not in the words_list."
]
},
{
"cell_type": "markdown",
"id": "a2440134",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Exercise** Using the method `lower` of `str` and the constructor `set`, assign `dictionary` to a set of lowercase English words from `words.words()`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "afb32693",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "c088033e9235e017fc574ee035ead4ba",
"grade": false,
"grade_id": "nltk",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"remove-output"
]
},
"outputs": [
{
"ename": "NotImplementedError",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m: "
]
}
],
"source": [
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "02427c26",
"metadata": {
"code_folding": [],
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "c74e6405fd76a61161c63ae219984445",
"grade": true,
"grade_id": "test-nltk",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'dictionary' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdictionary\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdictionary\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m234377\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"ada\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"Ada\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"hola\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'dictionary' is not defined"
]
}
],
"source": [
"# tests\n",
"assert isinstance(dictionary, set) and len(dictionary) == 234377\n",
"assert all(word in dictionary for word in (\"ada\", \"hello\"))\n",
"assert all(word not in dictionary for word in (\"Ada\", \"hola\"))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "e16864da",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "16fa129106e563d1db39c2e4889b23cd",
"grade": true,
"grade_id": "h_test-nltk",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'dictionary' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# hidden tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m### BEGIN TESTS\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0;34m\"world\"\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;34m\"mundo\"\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m### END TESTS\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'dictionary' is not defined"
]
}
],
"source": [
"# hidden tests\n",
"### BEGIN TESTS\n",
"assert \"world\" in dictionary\n",
"assert not \"mundo\" in dictionary\n",
"### END TESTS"
]
},
{
"cell_type": "markdown",
"id": "43e28b22",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Identify English-like text"
]
},
{
"cell_type": "markdown",
"id": "bee2f09f",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"To determine how English-like a text is, we calculate the following score:\n",
"\n",
"$$\n",
"\\frac{\\text{number of English words in the text}}{\\text{number of tokens in the text}} \n",
"$$\n",
"where tokens are substrings (not necessarily an English word) separated by white space characters in the text."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "e4b697e6",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'dictionary' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 14\u001b[0;31m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"hello world\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m\u001b[0m in \u001b[0;36mget_score\u001b[0;34m(text)\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;34m\"\"\"Returns the fraction of tokens which appear in dictionary.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mwords\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mtoken\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;34m\"\"\"Returns the fraction of tokens which appear in dictionary.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mwords\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mtoken\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'dictionary' is not defined"
]
}
],
"source": [
"def tokenizer(text):\n",
" \"\"\"Returns the list of tokens of the text.\"\"\"\n",
" return text.split()\n",
"\n",
"\n",
"def get_score(text):\n",
" \"\"\"Returns the fraction of tokens which appear in dictionary.\"\"\"\n",
" tokens = tokenizer(text)\n",
" words = [token for token in tokens if token in dictionary]\n",
" return len(words) / len(tokens)\n",
"\n",
"\n",
"# tests\n",
"get_score(\"hello world\"), get_score(\"Hello, World!\")"
]
},
{
"cell_type": "markdown",
"id": "0f25cca2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"As shown in tests above, the code fails to handle text with punctuations and uppercase letters properly. \n",
"In particular, \n",
"- while `get_score` recognizes `hello world` as English-like and returns the maximum score of 1, \n",
"- it fails to recognize `Hello, World!` as English-like and returns the minimum score of 0."
]
},
{
"cell_type": "markdown",
"id": "2a16aed9",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Why? This is because every words in `dictionary`\n",
"- are in lowercase, and\n",
"- have no leading/trailing punctuations."
]
},
{
"cell_type": "markdown",
"id": "bd2e7c07",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Exercise** Define a funtion `tokenizer` that \n",
"- takes a string `text` as an argument, and\n",
"- returns a `list` of tokens obtained by\n",
" 1. splitting `text` into a list using `split()`;\n",
" 2. removing leading/trailing punctuations in `string.punctuation` using the `strip` method; and\n",
" 3. converting all items of the list to lowercase using `lower()`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "17072e3c",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "42c1cbeebeac28ba4e9cb3ebdd2fe391",
"grade": false,
"grade_id": "tokenizer",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"remove-output"
]
},
"outputs": [],
"source": [
"import string\n",
"\n",
"\n",
"def tokenizer(text):\n",
" \"\"\"Returns the list of tokens of the text such that\n",
" 1) each token has no leading or training spaces/punctuations, and\n",
" 2) all letters in each tokens are in lowercase.\"\"\"\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0134b127",
"metadata": {
"code_folding": [],
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "5852278c832b32a9b3dce50b7544ccb7",
"grade": true,
"grade_id": "test-tokenizer",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [
{
"ename": "NotImplementedError",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"world\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0.99999\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Do you know Jean-Pierre?\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m\"do\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"you\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"know\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"jean-pierre\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Do you know Jean-Pierre?\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0.99999\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mtokenizer\u001b[0;34m(text)\u001b[0m\n\u001b[1;32m 7\u001b[0m 2) all letters in each tokens are in lowercase.\"\"\"\n\u001b[1;32m 8\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m: "
]
}
],
"source": [
"# tests\n",
"assert tokenizer(\"Hello, World!\") == [\"hello\", \"world\"]\n",
"assert get_score(\"Hello, World!\") >= 0.99999\n",
"assert tokenizer(\"Do you know Jean-Pierre?\") == [\"do\", \"you\", \"know\", \"jean-pierre\"]\n",
"assert get_score(\"Do you know Jean-Pierre?\") >= 0.99999"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "7f0f4595",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "132a8343cac2a507fcb365095a1e4092",
"grade": true,
"grade_id": "h_test-tokenizer",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [],
"source": [
"# hidden tests"
]
},
{
"cell_type": "markdown",
"id": "ea7cd782",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Launch a brute-force attack"
]
},
{
"cell_type": "markdown",
"id": "f8524adb",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Exercise** Define the function `cc_attack` that \n",
"- takes as arguments\n",
" - a string `ciphertext`,\n",
" - a floating point number `threshold` in the interval $(0,1)$ with a default value of $0.6$, and\n",
"- returns a generator that \n",
" - generates one-by-one in ascending order guesses of the key that\n",
" - decrypt `ciphertext` to texts with scores at least the `threshold`."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "e1581b44",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "074c5611790ab330bc0366632c0247bf",
"grade": false,
"grade_id": "cc_attack",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"remove-output"
]
},
"outputs": [],
"source": [
"def cc_attack(ciphertext, threshold=0.6):\n",
" \"\"\"Returns a generator that generates the next guess of the key that\n",
" decrypts the ciphertext to a text with get_score(text) at least the threshold.\n",
" \"\"\"\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "748bbce0",
"metadata": {
"code_folding": [],
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "a6b4220fb893ff497a21db970fcb85af",
"grade": true,
"grade_id": "test-cc_attack",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [
{
"ename": "NotImplementedError",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mciphertext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcc_encrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m12345\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mkey_generator\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcc_attack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mciphertext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mkey_guess\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey_generator\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mkey_guess\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m12345\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mcc_attack\u001b[0;34m(ciphertext, threshold)\u001b[0m\n\u001b[1;32m 4\u001b[0m \"\"\"\n\u001b[1;32m 5\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m: "
]
}
],
"source": [
"# tests\n",
"ciphertext = cc_encrypt(\"Hello, World!\", 12345)\n",
"key_generator = cc_attack(ciphertext)\n",
"key_guess = next(key_generator)\n",
"assert key_guess == 12345\n",
"text = cc_decrypt(ciphertext, key_guess)\n",
"print(\n",
" \"guess of the key: {}\\nscore: {}\\ntext :{}\".format(key_guess, get_score(text), text)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "3d3f879c",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "104ff78202eb48001f635f1b22d55872",
"grade": true,
"grade_id": "h_test-cc_attack",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [],
"source": [
"# hidden tests"
]
},
{
"cell_type": "markdown",
"id": "0b909b80",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Challenge"
]
},
{
"cell_type": "markdown",
"id": "66f274dd",
"metadata": {},
"source": [
"Another symmetric key cipher is [columnar transposition cipher](https://en.wikipedia.org/wiki/Transposition_cipher#Columnar_transposition). A transposition cipher encrypts a text by permuting instead of substituting characters."
]
},
{
"cell_type": "markdown",
"id": "fd59af11",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Exercise** Study and implement the irregular case of the [columnar transposition cipher](https://en.wikipedia.org/wiki/Transposition_cipher#Columnar_transposition) as described in Wikipedia page. Define the functions \n",
"- `ct_encrypt(plaintext, key)` for encryption, and \n",
"- `ct_decrypt(ciphertext, key)` for decryption. \n",
"\n",
"You can assume the plaintext is in uppercase and has no spaces/punctuations."
]
},
{
"cell_type": "markdown",
"id": "7eae76fc",
"metadata": {},
"source": [
"*Hints:* See the text cases for an example of `plaintext`, `key`, and the corresponding `ciphertext`. You can but are not required to follow the solution template below:\n",
"\n",
"```Python\n",
"def argsort(seq):\n",
" '''A helper function that returns the tuple of indices that would sort the\n",
" sequence seq.'''\n",
" return tuple(x[0] for x in sorted(enumerate(seq), key=lambda x: x[1]))\n",
"\n",
"\n",
"def ct_idx(length, key):\n",
" '''A helper function that returns the tuple of indices that would permute \n",
" the letters of a message according to the key using the irregular case of \n",
" columnar transposition cipher.'''\n",
" seq = tuple(range(length))\n",
" return [i for j in argsort(key) for i in _______________]\n",
"\n",
"\n",
"def ct_encrypt(plaintext, key):\n",
" \"\"\"\n",
" Return the ciphertext of a plaintext by the key using the irregular case\n",
" of columnar transposition cipher.\n",
"\n",
" Parameters\n",
" ----------\n",
" plaintext: str\n",
" a message in uppercase without punctuations/spaces.\n",
" key: str\n",
" secret key to encrypt plaintext.\n",
" \"\"\"\n",
" return ''.join([plaintext[i] for i in ct_idx(len(plaintext), key)])\n",
"\n",
"\n",
"def ct_decrypt(ciphertext, key):\n",
" \"\"\"\n",
" Return the plaintext of the ciphertext by the key using the irregular case\n",
" of columnar transposition cipher.\n",
"\n",
" Parameters\n",
" ----------\n",
" ciphertext: str\n",
" a string in uppercase without punctuations/spaces.\n",
" key: str\n",
" secret key to decrypt ciphertext.\n",
" \"\"\"\n",
" return _______________________________________________________________________\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "f63c9722",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "98df17cad7f85438965c855febc2b736",
"grade": false,
"grade_id": "ct",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"remove-output"
]
},
"outputs": [
{
"ename": "NotImplementedError",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNotImplementedError\u001b[0m: "
]
}
],
"source": [
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "f7ea0245",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "a893a09904c9aae29dafdd12e68c7318",
"grade": true,
"grade_id": "test-ct",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "-"
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'ct_encrypt' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mplaintext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"WEAREDISCOVEREDFLEEATONCE\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mciphertext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"EVLNACDTESEAROFODEECWIREE\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mct_encrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mplaintext\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mciphertext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mct_decrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mciphertext\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mplaintext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'ct_encrypt' is not defined"
]
}
],
"source": [
"# tests\n",
"key = \"ZEBRAS\"\n",
"plaintext = \"WEAREDISCOVEREDFLEEATONCE\"\n",
"ciphertext = \"EVLNACDTESEAROFODEECWIREE\"\n",
"assert ct_encrypt(plaintext, key) == ciphertext\n",
"assert ct_decrypt(ciphertext, key) == plaintext"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "bcc6c4be",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "08ae4d5bf0ccdaf09bd2ac518795f261",
"grade": true,
"grade_id": "h_test-ct",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": [
"hide-input",
"remove-output"
]
},
"outputs": [],
"source": [
"# hidden tests"
]
}
],
"metadata": {
"jupytext": {
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"source_map": [
14,
18,
23,
29,
33,
37,
47,
51,
55,
59,
85,
89,
97,
112,
116,
125,
129,
138,
145,
149,
171,
175,
183,
195,
199,
203,
207,
211,
230,
235,
241,
272,
296,
315,
319,
323,
330,
335,
346,
350,
359,
365,
369,
389,
414,
437,
441,
450,
471,
478,
484,
493,
520,
546,
565,
569,
579,
603,
633,
652,
656,
660,
668,
717,
737,
763
]
},
"nbformat": 4,
"nbformat_minor": 5
}