{ "cells": [ { "cell_type": "markdown", "id": "cfa38a11", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Pandas" ] }, { "cell_type": "markdown", "id": "b84f4b40", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "remove-cell" ] }, "source": [ "**CS1302 Introduction to Computer Programming**\n", "___" ] }, { "cell_type": "markdown", "id": "46469f05", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In this lab, we will analyze COVID19 data using a powerful package called [`pandas`](https://pandas.pydata.org/docs/user_guide/index.html). \n", "The package name comes from *panel data* and *Python for data analysis*." ] }, { "cell_type": "markdown", "id": "bb234f78", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Loading CSV Files with Pandas" ] }, { "cell_type": "markdown", "id": "50e88b34", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "[DATA.GOV.HK](https://data.gov.hk/en-data/dataset/hk-dh-chpsebcddr-novel-infectious-agent) provides an [API](https://data.gov.hk/en/help/api-spec#historicalAPI) to retrieve historical data on COVID-19 cases in Hong Kong." ] }, { "cell_type": "markdown", "id": "b8b6b12f", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The following uses the `urlencode` function to create the url that links to a csv file containing probable and confirmed cases of COVID-19 by Aug 1st, 2020." ] }, { "cell_type": "code", "execution_count": 1, "id": "79e0bfc1", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://api.data.gov.hk/v1/historical-archive/get-file?url=http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv&time=20200801-1204\n" ] } ], "source": [ "from urllib.parse import urlencode\n", "\n", "url_data_gov_hk_get = \"https://api.data.gov.hk/v1/historical-archive/get-file\"\n", "url_covid_csv = \"http://www.chp.gov.hk/files/misc/enhanced_sur_covid_19_eng.csv\"\n", "time = \"20200801-1204\"\n", "url_covid = url_data_gov_hk_get + \"?\" + urlencode({\"url\": url_covid_csv, \"time\": time})\n", "\n", "print(url_covid)" ] }, { "cell_type": "markdown", "id": "8a3841ea", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "````{tip}\n", "\n", "`urlencode` creates a string `'url=<...>&time=<...>'` with some [special symbols encoded](https://www.w3schools.com/tags/ref_urlencode.ASP), e.g.:\n", "- `:` is replaced by `%3A`, and\n", "- `/` is replaced by `%2F`.\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "8664000c", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Exercise** Write a function `simple_encode` that takes in a string and return a string with `:` and `/` encoded as described above." ] }, { "cell_type": "code", "execution_count": 2, "id": "72412e2b", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "fcc5ddee23fc2282a337b2ae4e443334", "grade": false, "grade_id": "simple_encode", "locked": false, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output" ] }, "outputs": [], "source": [ "def simple_encode(string):\n", " \"\"\"Returns the string with : and / encoded to %3A and %2F respectively.\"\"\"\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "204de450", "metadata": {}, "source": [ "````{hint} \n", "\n", "Use the `replace` method of `str`.\n", "\n", "````" ] }, { "cell_type": "code", "execution_count": 3, "id": "5b2df6c7", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "69acab1e962a5e893f87449880b0fb7d", "grade": true, "grade_id": "test-simple_encode", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "hide-input", "remove-output" ] }, "outputs": [ { "ename": "NotImplementedError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m assert (\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0msimple_encode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"http://www.chp.gov.hk/files/misc/enhanced_sur_covid_19_eng.csv\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m )\n", "\u001b[0;32m\u001b[0m in \u001b[0;36msimple_encode\u001b[0;34m(string)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\"\"\"Returns the string with : and / encoded to %3A and %2F respectively.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m: " ] } ], "source": [ "# tests\n", "assert (\n", " simple_encode(\"http://www.chp.gov.hk/files/misc/enhanced_sur_covid_19_eng.csv\")\n", " == \"http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv\"\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "id": "c7fb734b", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "f435b2f22350119a1ccbb85254d296bb", "grade": true, "grade_id": "htest-simple_encode", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# hidden tests" ] }, { "cell_type": "markdown", "id": "df7e7d32", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Like the function `open` that loads a file into memory, `pandas` has a function `read_csv` that loads a csv file. The csv file can even reside on the web:" ] }, { "cell_type": "code", "execution_count": 5, "id": "4587322e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Case no.Report dateDate of onsetGenderAgeName of hospital admittedHospitalised/Discharged/DeceasedHK/Non-HK residentCase classification*Confirmed/probable
0123/01/202021/01/2020M39NaNDischargedNon-HK residentImported caseConfirmed
1223/01/202018/01/2020M56NaNDischargedHK residentImported caseConfirmed
2324/01/202020/01/2020F62NaNDischargedNon-HK residentImported caseConfirmed
3424/01/202023/01/2020F62NaNDischargedNon-HK residentImported caseConfirmed
4524/01/202023/01/2020M63NaNDischargedNon-HK residentImported caseConfirmed
.................................
3268326931/07/202026/07/2020M22NaNTo be providedHK ResidentLocal caseConfirmed
3269327031/07/202028/07/2020F31NaNTo be providedHK ResidentEpidemiologically linked with local caseConfirmed
3270327131/07/2020AsymptomaticF36NaNTo be providedHK ResidentEpidemiologically linked with local caseConfirmed
3271327231/07/2020PendingF22NaNTo be providedHK ResidentLocal caseConfirmed
3272327331/07/202028/07/2020M68NaNTo be providedHK ResidentEpidemiologically linked with local caseConfirmed
\n", "

3273 rows × 10 columns

\n", "
" ], "text/plain": [ " Case no. Report date Date of onset Gender Age \\\n", "0 1 23/01/2020 21/01/2020 M 39 \n", "1 2 23/01/2020 18/01/2020 M 56 \n", "2 3 24/01/2020 20/01/2020 F 62 \n", "3 4 24/01/2020 23/01/2020 F 62 \n", "4 5 24/01/2020 23/01/2020 M 63 \n", "... ... ... ... ... ... \n", "3268 3269 31/07/2020 26/07/2020 M 22 \n", "3269 3270 31/07/2020 28/07/2020 F 31 \n", "3270 3271 31/07/2020 Asymptomatic F 36 \n", "3271 3272 31/07/2020 Pending F 22 \n", "3272 3273 31/07/2020 28/07/2020 M 68 \n", "\n", " Name of hospital admitted Hospitalised/Discharged/Deceased \\\n", "0 NaN Discharged \n", "1 NaN Discharged \n", "2 NaN Discharged \n", "3 NaN Discharged \n", "4 NaN Discharged \n", "... ... ... \n", "3268 NaN To be provided \n", "3269 NaN To be provided \n", "3270 NaN To be provided \n", "3271 NaN To be provided \n", "3272 NaN To be provided \n", "\n", " HK/Non-HK resident Case classification* \\\n", "0 Non-HK resident Imported case \n", "1 HK resident Imported case \n", "2 Non-HK resident Imported case \n", "3 Non-HK resident Imported case \n", "4 Non-HK resident Imported case \n", "... ... ... \n", "3268 HK Resident Local case \n", "3269 HK Resident Epidemiologically linked with local case \n", "3270 HK Resident Epidemiologically linked with local case \n", "3271 HK Resident Local case \n", "3272 HK Resident Epidemiologically linked with local case \n", "\n", " Confirmed/probable \n", "0 Confirmed \n", "1 Confirmed \n", "2 Confirmed \n", "3 Confirmed \n", "4 Confirmed \n", "... ... \n", "3268 Confirmed \n", "3269 Confirmed \n", "3270 Confirmed \n", "3271 Confirmed \n", "3272 Confirmed \n", "\n", "[3273 rows x 10 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df_covid = pd.read_csv(url_covid)\n", "\n", "print(type(df_covid))\n", "df_covid" ] }, { "cell_type": "markdown", "id": "9951caf8", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "````{tip}\n", "\n", "The above creates a [`DataFrame` object](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html?highlight=dataframe#pandas.DataFrame): \n", "- The content of the csv file is displayed as an HTML table conveniently. \n", "- We can control how much information to show by setting the [display options](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html).)\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "c40b75ac", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Exercise** Using the function `pd.read_csv`, load `building_list_eng.csv` as `df_building` from the url `url_building`." ] }, { "cell_type": "code", "execution_count": 6, "id": "61ce32cd", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "d45e94c9d1bc90db12e8d75eb74673d3", "grade": false, "grade_id": "df_building", "locked": false, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output" ] }, "outputs": [ { "ename": "NotImplementedError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m )\n\u001b[1;32m 6\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mdf_building\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNotImplementedError\u001b[0m: " ] } ], "source": [ "url_building_csv = \"http://www.chp.gov.hk/files/misc/building_list_eng.csv\"\n", "time = \"20200801-1203\"\n", "url_building = (\n", " url_data_gov_hk_get + \"?\" + urlencode({\"url\": url_building_csv, \"time\": time})\n", ")\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "df_building" ] }, { "cell_type": "code", "execution_count": 7, "id": "fc2fa2ca", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "f723c297315d804d79a6408f22fd1ea2", "grade": true, "grade_id": "test-df_building", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output", "hide-input" ] }, "outputs": [ { "ename": "NameError", "evalue": "name 'df_building' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m assert all(\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdf_building\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m == [\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\"District\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'df_building' is not defined" ] } ], "source": [ "# tests\n", "assert all(\n", " df_building.columns\n", " == [\n", " \"District\",\n", " \"Building name\",\n", " \"Last date of residence of the case(s)\",\n", " \"Related probable/confirmed cases\",\n", " ]\n", ") # check column names" ] }, { "cell_type": "code", "execution_count": 8, "id": "137c16b9", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ea66059af48263d1d8b23cbcf48e7637", "grade": true, "grade_id": "htest-df_building", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# hidden tests" ] }, { "cell_type": "markdown", "id": "d9f03798", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Selecting and Removing columns" ] }, { "cell_type": "markdown", "id": "37bdd895", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can obtain the column labels of a `Dataframe` using its `columns` attribute." ] }, { "cell_type": "code", "execution_count": 9, "id": "cb7fb1b0", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "Index(['Case no.', 'Report date', 'Date of onset', 'Gender', 'Age',\n", " 'Name of hospital admitted', 'Hospitalised/Discharged/Deceased',\n", " 'HK/Non-HK resident', 'Case classification*', 'Confirmed/probable'],\n", " dtype='object')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_covid.columns" ] }, { "cell_type": "markdown", "id": "1ac13f34", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using the indexing operator `[]`, a column of a `DataFrame` can be returned as a [`Series` object](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html), which is essentially a named array. \n", "We can further use the method `value_counts` to return the counts of different values in another `Series` object." ] }, { "cell_type": "code", "execution_count": 10, "id": "348a6309", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "F 1648\n", "M 1625\n", "Name: Gender, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series_gender_counts = df_covid[\n", " \"Gender\"\n", "].value_counts() # return the number of male and female cases\n", "\n", "print(type(series_gender_counts))\n", "series_gender_counts" ] }, { "cell_type": "markdown", "id": "ffe93085", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Exercise** For `df_building`, use the operator `[]` and method `value_counts` to assign `series_district_counts` to a `Series` object that stores the counts of buildings in different district." ] }, { "cell_type": "code", "execution_count": 11, "id": "d3390057", "metadata": { "code_folding": [], "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "b9589e1b6a5cec1ed3814a03a688a2df", "grade": false, "grade_id": "series_district_counts", "locked": false, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output" ] }, "outputs": [ { "ename": "NotImplementedError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mseries_district_counts\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNotImplementedError\u001b[0m: " ] } ], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "series_district_counts" ] }, { "cell_type": "code", "execution_count": 12, "id": "d532dd82", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "4fe69462a446c737e8604c053360475a", "grade": true, "grade_id": "test-series_district_counts", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output", "hide-input" ] }, "outputs": [ { "ename": "NameError", "evalue": "name 'series_district_counts' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseries_district_counts\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"Wong Tai Sin\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"Kwun Tong\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m313\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m212\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'series_district_counts' is not defined" ] } ], "source": [ "# tests\n", "assert all(series_district_counts[[\"Wong Tai Sin\", \"Kwun Tong\"]] == [313, 212])" ] }, { "cell_type": "code", "execution_count": 13, "id": "a1021233", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "76ed779a7bcd6dd420ec33444bf1a743", "grade": true, "grade_id": "htest-series_district_counts", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# hidden tests" ] }, { "cell_type": "markdown", "id": "458a65f0", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In `df_covid`, it appears that the column `Name of hospital admitted` contains no information. We can confirm this by\n", "1. returning the column as a `Series` with `df_covid_cases['Name of hospital admitted']`, and\n", "1. printing an array of unique column values using the method `unique`." ] }, { "cell_type": "code", "execution_count": 14, "id": "42c0fa2b", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "array([nan])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_covid[\"Name of hospital admitted\"].unique()" ] }, { "cell_type": "markdown", "id": "23b579c0", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Exercise** Drop the column `Name of hospital admitted` from `df_covid` using the `drop` method of the DataFrame." ] }, { "cell_type": "code", "execution_count": 15, "id": "b6628ca9", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "92ec259cb0de77ebe185cbbc17d58395", "grade": false, "grade_id": "drop", "locked": false, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output" ] }, "outputs": [ { "ename": "NotImplementedError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdf_covid\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNotImplementedError\u001b[0m: " ] } ], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "df_covid" ] }, { "cell_type": "markdown", "id": "f0675ce2", "metadata": {}, "source": [ "````{hint}\n", "\n", "Consider reading the documentation of the `drop` method for \n", "- mutating `df_covid` in place instead of creating a copy of the DataFrame with the column dropped, but\n", "- suppressing error when dropping a column that does not exist or has already been dropped.\n", "\n", "````" ] }, { "cell_type": "code", "execution_count": 16, "id": "3af70517", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "f924628b48e99a070a9ef0f25b297aad", "grade": true, "grade_id": "test-drop", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "hide-input", "remove-output" ] }, "outputs": [ { "ename": "ValueError", "evalue": "('Shapes must match', (10,), (9,))", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m assert all(\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mdf_covid\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m == [\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\"Case no.\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/my-conda-envs/jb/lib/python3.8/site-packages/pandas/core/ops/common.py\u001b[0m in \u001b[0;36mnew_method\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 63\u001b[0m \u001b[0mother\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mitem_from_zerodim\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 64\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 65\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 66\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 67\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mnew_method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/my-conda-envs/jb/lib/python3.8/site-packages/pandas/core/arraylike.py\u001b[0m in \u001b[0;36m__eq__\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0munpack_zerodim_and_defer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"__eq__\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__eq__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 29\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cmp_method\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0meq\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 30\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 31\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0munpack_zerodim_and_defer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"__ne__\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/my-conda-envs/jb/lib/python3.8/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36m_cmp_method\u001b[0;34m(self, other, op)\u001b[0m\n\u001b[1;32m 5628\u001b[0m \u001b[0;31m# don't pass MultiIndex\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5629\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0merrstate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mall\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"ignore\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5630\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcomp_method_OBJECT_ARRAY\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mop\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_values\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5631\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5632\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mis_interval_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/my-conda-envs/jb/lib/python3.8/site-packages/pandas/core/ops/array_ops.py\u001b[0m in \u001b[0;36mcomp_method_OBJECT_ARRAY\u001b[0;34m(op, x, y)\u001b[0m\n\u001b[1;32m 50\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 51\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 52\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Shapes must match\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 53\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlibops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvec_compare\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mravel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mravel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 54\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: ('Shapes must match', (10,), (9,))" ] } ], "source": [ "# tests\n", "assert all(\n", " df_covid.columns\n", " == [\n", " \"Case no.\",\n", " \"Report date\",\n", " \"Date of onset\",\n", " \"Gender\",\n", " \"Age\",\n", " \"Hospitalised/Discharged/Deceased\",\n", " \"HK/Non-HK resident\",\n", " \"Case classification*\",\n", " \"Confirmed/probable\",\n", " ]\n", ")" ] }, { "cell_type": "markdown", "id": "be0cd0b1", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Selecting Rows of DataFrame" ] }, { "cell_type": "markdown", "id": "8bf5da05", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can select the confirmed male cases using the attribute [`loc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) and the indexing operator `[]`." ] }, { "cell_type": "code", "execution_count": 17, "id": "e77f8217", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Case no.Report dateDate of onsetGenderAgeName of hospital admittedHospitalised/Discharged/DeceasedHK/Non-HK residentCase classification*Confirmed/probable
0123/01/202021/01/2020M39NaNDischargedNon-HK residentImported caseConfirmed
1223/01/202018/01/2020M56NaNDischargedHK residentImported caseConfirmed
4524/01/202023/01/2020M63NaNDischargedNon-HK residentImported caseConfirmed
5626/01/202021/01/2020M47NaNDischargedHK residentImported caseConfirmed
7826/01/202025/01/2020M64NaNDischargedNon-HK residentImported caseConfirmed
.................................
3264326531/07/202025/07/2020M54NaNTo be providedHK ResidentLocal caseConfirmed
3265326631/07/202030/07/2020M69NaNTo be providedHK ResidentEpidemiologically linked with local caseConfirmed
3267326831/07/202023/07/2020M61NaNTo be providedHK ResidentEpidemiologically linked with local caseConfirmed
3268326931/07/202026/07/2020M22NaNTo be providedHK ResidentLocal caseConfirmed
3272327331/07/202028/07/2020M68NaNTo be providedHK ResidentEpidemiologically linked with local caseConfirmed
\n", "

1624 rows × 10 columns

\n", "
" ], "text/plain": [ " Case no. Report date Date of onset Gender Age \\\n", "0 1 23/01/2020 21/01/2020 M 39 \n", "1 2 23/01/2020 18/01/2020 M 56 \n", "4 5 24/01/2020 23/01/2020 M 63 \n", "5 6 26/01/2020 21/01/2020 M 47 \n", "7 8 26/01/2020 25/01/2020 M 64 \n", "... ... ... ... ... ... \n", "3264 3265 31/07/2020 25/07/2020 M 54 \n", "3265 3266 31/07/2020 30/07/2020 M 69 \n", "3267 3268 31/07/2020 23/07/2020 M 61 \n", "3268 3269 31/07/2020 26/07/2020 M 22 \n", "3272 3273 31/07/2020 28/07/2020 M 68 \n", "\n", " Name of hospital admitted Hospitalised/Discharged/Deceased \\\n", "0 NaN Discharged \n", "1 NaN Discharged \n", "4 NaN Discharged \n", "5 NaN Discharged \n", "7 NaN Discharged \n", "... ... ... \n", "3264 NaN To be provided \n", "3265 NaN To be provided \n", "3267 NaN To be provided \n", "3268 NaN To be provided \n", "3272 NaN To be provided \n", "\n", " HK/Non-HK resident Case classification* \\\n", "0 Non-HK resident Imported case \n", "1 HK resident Imported case \n", "4 Non-HK resident Imported case \n", "5 HK resident Imported case \n", "7 Non-HK resident Imported case \n", "... ... ... \n", "3264 HK Resident Local case \n", "3265 HK Resident Epidemiologically linked with local case \n", "3267 HK Resident Epidemiologically linked with local case \n", "3268 HK Resident Local case \n", "3272 HK Resident Epidemiologically linked with local case \n", "\n", " Confirmed/probable \n", "0 Confirmed \n", "1 Confirmed \n", "4 Confirmed \n", "5 Confirmed \n", "7 Confirmed \n", "... ... \n", "3264 Confirmed \n", "3265 Confirmed \n", "3267 Confirmed \n", "3268 Confirmed \n", "3272 Confirmed \n", "\n", "[1624 rows x 10 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_confirmed_male = df_covid.loc[\n", " (df_covid[\"Confirmed/probable\"] == \"Confirmed\") & (df_covid[\"Gender\"] == \"M\")\n", "]\n", "print(type(df_covid.loc))\n", "df_confirmed_male" ] }, { "cell_type": "markdown", "id": "c101b347", "metadata": {}, "source": [ "````{tip}\n", "\n", "`loc` essentially returns an object that implements the [advanced indexing method](https://numpy.org/doc/stable/reference/arrays.indexing.html#advanced-indexing) for `__getitem__`. In particular, the above uses [boolean indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html#boolean-array-indexing).\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "8e530cfc", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Exercise** Assign `df_confirmed_local` to a `DataFrame` of confirmed cases that are local or epidemiologically linked with a local case." ] }, { "cell_type": "code", "execution_count": 18, "id": "583da3d8", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "617372bff993b2f7f7810f2d97a9d106", "grade": false, "grade_id": "df_confirmed_local", "locked": false, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output" ] }, "outputs": [ { "ename": "NotImplementedError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdf_confirmed_local\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNotImplementedError\u001b[0m: " ] } ], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "df_confirmed_local" ] }, { "cell_type": "code", "execution_count": 19, "id": "20f12a50", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "1291e901430037b15777d2fe481cddbb", "grade": true, "grade_id": "test-df_confirmed_local", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "slideshow": { "slide_type": "fragment" }, "tags": [ "remove-output", "hide-input" ] }, "outputs": [ { "ename": "NameError", "evalue": "name 'df_confirmed_local' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m assert set(df_confirmed_local[\"Case classification*\"].unique()) == {\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;34m\"Epidemiologically linked with local case\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\"Local case\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m }\n", "\u001b[0;31mNameError\u001b[0m: name 'df_confirmed_local' is not defined" ] } ], "source": [ "# tests\n", "assert set(df_confirmed_local[\"Case classification*\"].unique()) == {\n", " \"Epidemiologically linked with local case\",\n", " \"Local case\",\n", "}" ] }, { "cell_type": "code", "execution_count": 20, "id": "d8071e84", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "285c0f24376b876b8f458e07a05563df", "grade": true, "grade_id": "htest-df_confirmed_local", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# hidden tests" ] }, { "cell_type": "markdown", "id": "44806c3f", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Exercise** Write a function `case_counts` that \n", "- takes an argument `district`, and\n", "- returns the number of cases in `district`." ] }, { "cell_type": "code", "execution_count": 21, "id": "1390b928", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "86188136c62558ec15f61d47cfbbdd7a", "grade": false, "grade_id": "case_counts", "locked": false, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output" ] }, "outputs": [], "source": [ "def case_counts(district):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "id": "ce0b2eda", "metadata": {}, "source": [ "````{hint} \n", "\n", "Be careful that there can be more than one case for each building and there may be multiple buildings associated with one case. You may want to use the `split` and `strip` methods of `str` to obtain a list of cases from the `Dataframe`.\n", "\n", "````" ] }, { "cell_type": "code", "execution_count": 22, "id": "5507c8fa", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "721a790b1a8a2344baaa39eab502b96a", "grade": true, "grade_id": "test-case_counts", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "slideshow": { "slide_type": "-" }, "tags": [ "remove-output", "hide-input" ] }, "outputs": [ { "ename": "NotImplementedError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# tests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mcase_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Kwai Tsing\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m109\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mcase_counts\u001b[0;34m(district)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcase_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdistrict\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# YOUR CODE HERE\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNotImplementedError\u001b[0m: " ] } ], "source": [ "# tests\n", "assert case_counts(\"Kwai Tsing\") == 109" ] }, { "cell_type": "code", "execution_count": 23, "id": "1a626470", "metadata": { "code_folding": [ 0 ], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e73f4d8c0f1732551ea7ffa66af9fa1d", "grade": true, "grade_id": "htest-case_counts", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# hidden tests" ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.11.5" } }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "source_map": [ 14, 18, 23, 28, 32, 36, 40, 55, 65, 69, 89, 97, 123, 143, 147, 160, 170, 174, 200, 231, 251, 255, 259, 267, 272, 285, 289, 311, 334, 354, 360, 368, 372, 391, 401, 435, 439, 443, 453, 461, 465, 486, 512, 532, 538, 557, 565, 588 ] }, "nbformat": 4, "nbformat_minor": 5 }