{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "92ad4626",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Cybersecurity"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0152ed3a",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "remove-cell"
    ]
   },
   "source": [
    "**CS1302 Introduction to Computer Programming**\n",
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc221fee",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Python is a popular tool among hackers and engineers. In this lab, you will learn Cryptology in cybersecurity, which covers\n",
    "- [Cryptography](https://en.wikipedia.org/wiki/Cryptography): Encryption and decryption using a cipher.\n",
    "- [Cryptanalysis](https://en.wikipedia.org/wiki/Cryptanalysis): Devising an attack to break a cipher."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8165adbc",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Caesar symmetric key cipher"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38edbdd7",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We first implements a simple cipher called the [Caesar cipher](https://en.wikipedia.org/wiki/Caesar_cipher)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "377a2a79",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe width=\"800\" height=\"415\" src=\"https://www.youtube.com/embed/sMOZf4GN3oc\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<iframe width=\"800\" height=\"415\" src=\"https://www.youtube.com/embed/sMOZf4GN3oc\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e89a9d2d",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Encrypt/decrypt a character"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71108895",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**How to encrypt a character?**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0dc957d7",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The following code encrypts a character `char` using a non-negative integer `key`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5b11fee2",
   "metadata": {
    "code_folding": [],
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "cc_n = 1114112\n",
    "\n",
    "\n",
    "def cc_encrypt_character(char, key):\n",
    "    \"\"\"\n",
    "    Return the encryption of a character by an integer key using Caesar cipher.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    char: str \n",
    "        a unicode (UTF-8) character to be encrypted.\n",
    "    key int:\n",
    "        secret key to encrypt char.\n",
    "    \"\"\"\n",
    "    char_code = ord(char)\n",
    "    shifted_char_code = (char_code + key) % cc_n\n",
    "    encrypted_char = chr(shifted_char_code)\n",
    "    return encrypted_char"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "058ffe8e",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "For example, to encrypt the letter `'A'` using a secret key `5`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f924a318",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'F'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cc_encrypt_character(\"A\", 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f3a93e2",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The character `'A'` is encrypted to the character `'F'` as follows:\n",
    "\n",
    "1. `ord(char)` return the integer `65` that is the code point (integer representation) of the unicode of `'A'`. \n",
    "2. `(char_code + key) % cc_n` cyclic shifts the code by the key `5`.\n",
    "3. `chr(shifted_char_code)` converts the shifted code back to a character, which is `'F'`.\n",
    "\n",
    "| Encryption                      |     |       |     |     |     |     |     |     |\n",
    "| ------------------------------- | --- | ----- | --- | --- | --- | --- | --- | --- |\n",
    "| `char`                          | ... | **A** | B   | C   | D   | E   | F   | ... |\n",
    "| `ord(char)`                     | ... | **65**| 66  | 67  | 68  | 69  | 70  | ... |\n",
    "| `(ord(char) + key) % cc_n`      | ... | **70**| 71  | 72  | 73  | 74  | 75  | ... |\n",
    "| `(chr(ord(char) + key) % cc_n)` | ... | **F** | G   | H   | I   | J   | K   | ... |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b090db",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "You may learn more about `ord` and `chr` from their docstrings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "edb8d679",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on built-in function ord in module builtins:\n",
      "\n",
      "ord(c, /)\n",
      "    Return the Unicode code point for a one-character string.\n",
      "\n",
      "Help on built-in function chr in module builtins:\n",
      "\n",
      "chr(i, /)\n",
      "    Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(ord)\n",
    "help(chr)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93a54e32",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**How to decrypt a character?**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4162d16c",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Mathematically, we define the encryption and decryption of a character for Caesar cipher as\n",
    "\n",
    "$$ \\begin{aligned} E(x,k) &:= x + k \\mod n & \\text{(encryption)} \\\\\n",
    "D(x,k) &:= x - k \\mod n & \\text{(decryption),} \\end{aligned}\n",
    "$$\n",
    "where $x$ is the character code in $\\{0,\\dots,n\\}$ and $k$ is the secret key. `mod` operator above is the modulo operator. In Mathematics, it has a lower precedence than addition and multiplication and is typeset with an extra space accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1682dea",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The encryption and decryption satisfies the recoverability condition\n",
    "\n",
    "$$ D(E(x,k),k) = x $$\n",
    "so two people with a common secret key can encrypt and decrypt a character, but others not knowing the key cannot. This is a defining property of a [symmetric cipher](https://en.wikipedia.org/wiki/Symmetric-key_algorithm)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4467db5b",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The following code decrypts a character using a key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0fc73c5e",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "def cc_decrypt_character(char, key):\n",
    "    \"\"\"\n",
    "    Return the decryption of a character by the key using Caesar cipher.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    char: str\n",
    "        a unicode (UTF-8) character to be decrypted.\n",
    "    key: int\n",
    "        secret key to decrypt char.\n",
    "    \"\"\"\n",
    "    char_code = ord(char)\n",
    "    shifted_char_code = (char_code - key) % cc_n\n",
    "    decrypted_char = chr(shifted_char_code)\n",
    "    return decrypted_char"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bb24ea6",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "For instance, to decrypt the letter `'F'` by the secret key `5`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a0a033c7",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'A'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cc_decrypt_character(\"F\", 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a014cb2",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The character `'F'` is decrypted back to `'A'` because\n",
    "`(char_code - key) % cc_n` reverse cyclic shifts the code by the key `5`.\n",
    "\n",
    "| Encryption                      |     |       |     |     |     |     |     |     | Decryption                      |\n",
    "| ------------------------------- | --- | ----- | --- | --- | --- | --- | --- | --- | ------------------------------- |\n",
    "| `char`                          | ... | **A** | B   | C   | D   | E   | F   | ... | `(chr(ord(char) - key) % cc_n)` |\n",
    "| `ord(char)`                     | ... | **65**| 66  | 67  | 68  | 69  | 70  | ... | `(ord(char) - key) % cc_n`      |\n",
    "| `(ord(char) + key) % cc_n`      | ... | **70**| 71  | 72  | 73  | 74  | 75  | ... | `ord(char)`                     |\n",
    "| `(chr(ord(char) + key) % cc_n)` | ... | **F** | G   | H   | I   | J   | K   | ... | `char`                          |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "493d21a1",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Exercise** Why did we set `cc_n = 1114112`? Explain whether the recoverability property may fail if we set `cc_n` to a bigger number or remove `% cc_n` for both `cc_encrypt_character` and `cc_decrypt_character`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93393d49",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "e73c497e70f1a86fe0ead6483b3a7d17",
     "grade": true,
     "grade_id": "modulo",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "YOUR ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "135a8d43",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Encrypt a plaintext and decrypt a ciphertext"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfdd7ca2",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Of course, it is more interesting to encrypt a string instead of a character. The following code implements this in one line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "69eccae8",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "def cc_encrypt(plaintext, key):\n",
    "    \"\"\"\n",
    "    Return the ciphertext of a plaintext by the key using Caesar cipher.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    plaintext: str\n",
    "        A unicode (UTF-8) message to be encrypted.\n",
    "    public_key: int\n",
    "        Public key to encrypt plaintext.\n",
    "    \"\"\"\n",
    "    return \"\".join([chr((ord(char) + key) % cc_n) for char in plaintext])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3eb50a3e",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The above function encrypts a message, referred to as the *plaintext*, by replacing each character with its encryption.  \n",
    "This is referred to as a [*substitution cipher*](https://en.wikipedia.org/wiki/Substitution_cipher)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8eeb9384",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Exercise** Define a function `cc_decrypt` that\n",
    "- takes a string `ciphertext` and an integer `key`, and\n",
    "- returns the plaintext that encrypts to `ciphertext` by the key using Caesar cipher."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "07a0b200",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "99cece2794f8673cbc4b9862ba6f38a4",
     "grade": false,
     "grade_id": "cc_decrypt",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "def cc_decrypt(ciphertext, key):\n",
    "    \"\"\"\n",
    "    Return the plaintext that encrypts to ciphertext by the key using Caesar cipher.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    ciphertext: str\n",
    "        message to be decrypted.\n",
    "    key: int\n",
    "        secret key to decrypt the ciphertext.\n",
    "    \"\"\"\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f61fc2e9",
   "metadata": {
    "code_folding": [],
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "94e426a0a59da6e0e8d1b8ff16f8eae5",
     "grade": true,
     "grade_id": "test-cc_decrypt",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NotImplementedError",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-9-8800b651ea96>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mcc_decrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mr\"bcdefghijklmnopqrstuvwxyz{\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"abcdefghijklmnopqrstuvwxyz\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mcc_decrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mr\"Mjqqt1%\\twqi&\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-8-8dbf6bca9c57>\u001b[0m in \u001b[0;36mcc_decrypt\u001b[0;34m(ciphertext, key)\u001b[0m\n\u001b[1;32m     11\u001b[0m     \"\"\"\n\u001b[1;32m     12\u001b[0m     \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 13\u001b[0;31m     \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m: "
     ]
    }
   ],
   "source": [
    "# tests\n",
    "assert cc_decrypt(r\"bcdefghijklmnopqrstuvwxyz{\", 1) == \"abcdefghijklmnopqrstuvwxyz\"\n",
    "assert cc_decrypt(r\"Mjqqt1%\\twqi&\", 5) == \"Hello, World!\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "fcf36815",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5d06619428a17c86edc6f46896203a67",
     "grade": true,
     "grade_id": "h_test-cc_decrypt",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "# hidden tests"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1366ee3",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Brute-force attack"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdae8437",
   "metadata": {},
   "source": [
    "### Create an English dictionary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b509fcda",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "You will launch a brute-force attack to guess the key that encrypts an English text. The idea is simple: \n",
    "\n",
    "- You try decrypting the ciphertext with different keys, and \n",
    "- see which of the resulting plaintexts make most sense (most english-like)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c003f7fb",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "To check whether a plaintext is English-like, we need to have a list of English words. One way is to type them out\n",
    "but this is tedious. Alternatively, we can obtain the list from the *Natural Language Toolkit (NLTK)*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "097de725",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package words to /home/cs1302/nltk_data...\n",
      "[nltk_data]   Package words is already up-to-date!\n"
     ]
    }
   ],
   "source": [
    "import nltk\n",
    "\n",
    "nltk.download(\"words\")\n",
    "from nltk.corpus import words"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2732a10b",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "`words.words()` returns a list of words. We can check whether a string is in the list using the operator `in`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "825eb290",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'Ada' in dictionary? True\n",
      "'ada' in dictionary? False\n",
      "'Hello' in dictionary? False\n",
      "'hello' in dictionary? True\n"
     ]
    }
   ],
   "source": [
    "for word in \"Ada\", \"ada\", \"Hello\", \"hello\":\n",
    "    print(\"{!r} in dictionary? {}\".format(word, word in words.words()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cebc98d4",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "However there are two issues:\n",
    "- Checking membership is slow for a long list.\n",
    "- Both 'Hello' and 'ada' are English-like but they are not in the words_list."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2440134",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Exercise** Using the method `lower` of `str` and the constructor `set`, assign `dictionary` to a set of lowercase English words from `words.words()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "afb32693",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c088033e9235e017fc574ee035ead4ba",
     "grade": false,
     "grade_id": "nltk",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NotImplementedError",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-13-15b94d1fa268>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m: "
     ]
    }
   ],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "02427c26",
   "metadata": {
    "code_folding": [],
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c74e6405fd76a61161c63ae219984445",
     "grade": true,
     "grade_id": "test-nltk",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'dictionary' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-14-c4c105658abf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdictionary\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdictionary\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m234377\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"ada\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"Ada\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"hola\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'dictionary' is not defined"
     ]
    }
   ],
   "source": [
    "# tests\n",
    "assert isinstance(dictionary, set) and len(dictionary) == 234377\n",
    "assert all(word in dictionary for word in (\"ada\", \"hello\"))\n",
    "assert all(word not in dictionary for word in (\"Ada\", \"hola\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e16864da",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "16fa129106e563d1db39c2e4889b23cd",
     "grade": true,
     "grade_id": "h_test-nltk",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'dictionary' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-15-ba715a9facfb>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# hidden tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;31m### BEGIN TESTS\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0;34m\"world\"\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      4\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;34m\"mundo\"\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;31m### END TESTS\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'dictionary' is not defined"
     ]
    }
   ],
   "source": [
    "# hidden tests\n",
    "### BEGIN TESTS\n",
    "assert \"world\" in dictionary\n",
    "assert not \"mundo\" in dictionary\n",
    "### END TESTS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43e28b22",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Identify English-like text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bee2f09f",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "To determine how English-like a text is, we calculate the following score:\n",
    "\n",
    "$$\n",
    "\\frac{\\text{number of English words in the text}}{\\text{number of tokens in the text}} \n",
    "$$\n",
    "where tokens are substrings (not necessarily an English word) separated by white space characters in the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "e4b697e6",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'dictionary' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-16-e9ed45c44ae3>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     12\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     13\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 14\u001b[0;31m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"hello world\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-16-e9ed45c44ae3>\u001b[0m in \u001b[0;36mget_score\u001b[0;34m(text)\u001b[0m\n\u001b[1;32m      7\u001b[0m     \u001b[0;34m\"\"\"Returns the fraction of tokens which appear in dictionary.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m     \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m     \u001b[0mwords\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mtoken\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     10\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-16-e9ed45c44ae3>\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m      7\u001b[0m     \u001b[0;34m\"\"\"Returns the fraction of tokens which appear in dictionary.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m     \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m     \u001b[0mwords\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mtoken\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdictionary\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     10\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'dictionary' is not defined"
     ]
    }
   ],
   "source": [
    "def tokenizer(text):\n",
    "    \"\"\"Returns the list of tokens of the text.\"\"\"\n",
    "    return text.split()\n",
    "\n",
    "\n",
    "def get_score(text):\n",
    "    \"\"\"Returns the fraction of tokens which appear in dictionary.\"\"\"\n",
    "    tokens = tokenizer(text)\n",
    "    words = [token for token in tokens if token in dictionary]\n",
    "    return len(words) / len(tokens)\n",
    "\n",
    "\n",
    "# tests\n",
    "get_score(\"hello world\"), get_score(\"Hello, World!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f25cca2",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "As shown in tests above, the code fails to handle text with punctuations and uppercase letters properly.  \n",
    "In particular, \n",
    "- while `get_score` recognizes `hello world` as English-like and returns the maximum score of 1, \n",
    "- it fails to recognize `Hello, World!` as English-like and returns the minimum score of 0."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a16aed9",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Why? This is because every words in `dictionary`\n",
    "- are in lowercase, and\n",
    "- have no leading/trailing punctuations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd2e7c07",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Exercise** Define a funtion `tokenizer` that \n",
    "- takes a string `text` as an argument, and\n",
    "- returns a `list` of tokens obtained by\n",
    "  1. splitting `text` into a list using `split()`;\n",
    "  2. removing leading/trailing punctuations in `string.punctuation` using the `strip` method; and\n",
    "  3. converting all items of the list to lowercase using `lower()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "17072e3c",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "42c1cbeebeac28ba4e9cb3ebdd2fe391",
     "grade": false,
     "grade_id": "tokenizer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "import string\n",
    "\n",
    "\n",
    "def tokenizer(text):\n",
    "    \"\"\"Returns the list of tokens of the text such that\n",
    "    1) each token has no leading or training spaces/punctuations, and\n",
    "    2) all letters in each tokens are in lowercase.\"\"\"\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "0134b127",
   "metadata": {
    "code_folding": [],
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5852278c832b32a9b3dce50b7544ccb7",
     "grade": true,
     "grade_id": "test-tokenizer",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NotImplementedError",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-18-7a8577a4666a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"world\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0.99999\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mtokenizer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Do you know Jean-Pierre?\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m\"do\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"you\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"know\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"jean-pierre\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mget_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Do you know Jean-Pierre?\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0.99999\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-17-56a5db488d53>\u001b[0m in \u001b[0;36mtokenizer\u001b[0;34m(text)\u001b[0m\n\u001b[1;32m      7\u001b[0m     2) all letters in each tokens are in lowercase.\"\"\"\n\u001b[1;32m      8\u001b[0m     \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m     \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m: "
     ]
    }
   ],
   "source": [
    "# tests\n",
    "assert tokenizer(\"Hello, World!\") == [\"hello\", \"world\"]\n",
    "assert get_score(\"Hello, World!\") >= 0.99999\n",
    "assert tokenizer(\"Do you know Jean-Pierre?\") == [\"do\", \"you\", \"know\", \"jean-pierre\"]\n",
    "assert get_score(\"Do you know Jean-Pierre?\") >= 0.99999"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "7f0f4595",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "132a8343cac2a507fcb365095a1e4092",
     "grade": true,
     "grade_id": "h_test-tokenizer",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "# hidden tests"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea7cd782",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Launch a brute-force attack"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8524adb",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Exercise** Define the function `cc_attack` that \n",
    "- takes as arguments\n",
    "    - a string `ciphertext`,\n",
    "    - a floating point number `threshold` in the interval $(0,1)$ with a default value of $0.6$, and\n",
    "- returns a generator that  \n",
    "    - generates one-by-one in ascending order guesses of the key that\n",
    "    - decrypt `ciphertext` to texts with scores at least the `threshold`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "e1581b44",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "074c5611790ab330bc0366632c0247bf",
     "grade": false,
     "grade_id": "cc_attack",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "def cc_attack(ciphertext, threshold=0.6):\n",
    "    \"\"\"Returns a generator that generates the next guess of the key that\n",
    "    decrypts the ciphertext to a text with get_score(text) at least the threshold.\n",
    "    \"\"\"\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "748bbce0",
   "metadata": {
    "code_folding": [],
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a6b4220fb893ff497a21db970fcb85af",
     "grade": true,
     "grade_id": "test-cc_attack",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NotImplementedError",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-21-320bb4a5e254>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0mciphertext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcc_encrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Hello, World!\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m12345\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mkey_generator\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcc_attack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mciphertext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      4\u001b[0m \u001b[0mkey_guess\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey_generator\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mkey_guess\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m12345\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-20-54871c2021f4>\u001b[0m in \u001b[0;36mcc_attack\u001b[0;34m(ciphertext, threshold)\u001b[0m\n\u001b[1;32m      4\u001b[0m     \"\"\"\n\u001b[1;32m      5\u001b[0m     \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m     \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m: "
     ]
    }
   ],
   "source": [
    "# tests\n",
    "ciphertext = cc_encrypt(\"Hello, World!\", 12345)\n",
    "key_generator = cc_attack(ciphertext)\n",
    "key_guess = next(key_generator)\n",
    "assert key_guess == 12345\n",
    "text = cc_decrypt(ciphertext, key_guess)\n",
    "print(\n",
    "    \"guess of the key: {}\\nscore: {}\\ntext :{}\".format(key_guess, get_score(text), text)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "3d3f879c",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "104ff78202eb48001f635f1b22d55872",
     "grade": true,
     "grade_id": "h_test-cc_attack",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "# hidden tests"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b909b80",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Challenge"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66f274dd",
   "metadata": {},
   "source": [
    "Another symmetric key cipher is [columnar transposition cipher](https://en.wikipedia.org/wiki/Transposition_cipher#Columnar_transposition). A transposition cipher encrypts a text by permuting instead of substituting characters."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd59af11",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Exercise** Study and implement the irregular case of the [columnar transposition cipher](https://en.wikipedia.org/wiki/Transposition_cipher#Columnar_transposition) as described in Wikipedia page. Define the functions \n",
    "- `ct_encrypt(plaintext, key)` for encryption, and \n",
    "- `ct_decrypt(ciphertext, key)` for decryption. \n",
    "\n",
    "You can assume the plaintext is in uppercase and has no spaces/punctuations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7eae76fc",
   "metadata": {},
   "source": [
    "*Hints:* See the text cases for an example of `plaintext`, `key`, and the corresponding `ciphertext`. You can but are not required to follow the solution template below:\n",
    "\n",
    "```Python\n",
    "def argsort(seq):\n",
    "    '''A helper function that returns the tuple of indices that would sort the\n",
    "    sequence seq.'''\n",
    "    return tuple(x[0] for x in sorted(enumerate(seq), key=lambda x: x[1]))\n",
    "\n",
    "\n",
    "def ct_idx(length, key):\n",
    "    '''A helper function that returns the tuple of indices that would permute \n",
    "    the letters of a message according to the key using the irregular case of \n",
    "    columnar transposition cipher.'''\n",
    "    seq = tuple(range(length))\n",
    "    return [i for j in argsort(key) for i in _______________]\n",
    "\n",
    "\n",
    "def ct_encrypt(plaintext, key):\n",
    "    \"\"\"\n",
    "    Return the ciphertext of a plaintext by the key using the irregular case\n",
    "    of columnar transposition cipher.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    plaintext: str\n",
    "        a message in uppercase without punctuations/spaces.\n",
    "    key: str\n",
    "        secret key to encrypt plaintext.\n",
    "    \"\"\"\n",
    "    return ''.join([plaintext[i] for i in ct_idx(len(plaintext), key)])\n",
    "\n",
    "\n",
    "def ct_decrypt(ciphertext, key):\n",
    "    \"\"\"\n",
    "    Return the plaintext of the ciphertext by the key using the irregular case\n",
    "    of columnar transposition cipher.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    ciphertext: str\n",
    "        a string in uppercase without punctuations/spaces.\n",
    "    key: str\n",
    "        secret key to decrypt ciphertext.\n",
    "    \"\"\"\n",
    "    return _______________________________________________________________________\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "f63c9722",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "98df17cad7f85438965c855febc2b736",
     "grade": false,
     "grade_id": "ct",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NotImplementedError",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-23-15b94d1fa268>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNotImplementedError\u001b[0m: "
     ]
    }
   ],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "f7ea0245",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a893a09904c9aae29dafdd12e68c7318",
     "grade": true,
     "grade_id": "test-ct",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'ct_encrypt' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-24-7c0ef3dfeaa3>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mplaintext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"WEAREDISCOVEREDFLEEATONCE\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mciphertext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"EVLNACDTESEAROFODEECWIREE\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mct_encrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mplaintext\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mciphertext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      6\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mct_decrypt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mciphertext\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mplaintext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'ct_encrypt' is not defined"
     ]
    }
   ],
   "source": [
    "# tests\n",
    "key = \"ZEBRAS\"\n",
    "plaintext = \"WEAREDISCOVEREDFLEEATONCE\"\n",
    "ciphertext = \"EVLNACDTESEAROFODEECWIREE\"\n",
    "assert ct_encrypt(plaintext, key) == ciphertext\n",
    "assert ct_decrypt(ciphertext, key) == plaintext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "bcc6c4be",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "08ae4d5bf0ccdaf09bd2ac518795f261",
     "grade": true,
     "grade_id": "h_test-ct",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    },
    "tags": [
     "hide-input",
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "# hidden tests"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "text_representation": {
    "extension": ".md",
    "format_name": "myst",
    "format_version": 0.13,
    "jupytext_version": "1.10.3"
   }
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "source_map": [
   14,
   18,
   23,
   29,
   33,
   37,
   47,
   51,
   55,
   59,
   85,
   89,
   97,
   112,
   116,
   125,
   129,
   138,
   145,
   149,
   171,
   175,
   183,
   195,
   199,
   203,
   207,
   211,
   230,
   235,
   241,
   272,
   296,
   315,
   319,
   323,
   330,
   335,
   346,
   350,
   359,
   365,
   369,
   389,
   414,
   437,
   441,
   450,
   471,
   478,
   484,
   493,
   520,
   546,
   565,
   569,
   579,
   603,
   633,
   652,
   656,
   660,
   668,
   717,
   737,
   763
  ]
 },
 "nbformat": 4,
 "nbformat_minor": 5
}