City University of Hong Kong
Department of Computer Science
Artificial Intelligence -- Past, Present, and Future
Semester A, 2022/23
This is a 3-credit course.
This AI course is suitable for both technical and non-technical students
alike. It aims to firstly provide an overall view of what is AI, its
developments over the past decades, its current trends, and a look at potential
future directions. It will cover impact of AI to society and business. Through
case studies, students gain a better insight on different AI technologies and
how they can be used to address a wide range of social and business needs. The
course will broaden students' understanding of current state-of-the-art in AI
and future trends, as well as how various needs of different industries can be
addressed through innovative use of AI. The second objective of this course is
to help students become creative innovators in applying "AI first" concepts to
solving real-world problems through project-based work. This course will be
useful for students from any discipline and will give insights to the value of
AI across industries from a global point of view as well as issues related to
their ethical use. To make this course as widely accessible to as many people as
possible from any background, no programming will be required and no prior
programming skills are assumed.
Textbook
There is no textbook for the course. All teaching materials will be from
online sources.
Instructor:
Dr. Dapeng Wu
Office: Y6307, AC-1 Building
Email: dapengwu@cityu.edu.hk
TA:
1) Weiwei Fu
Email: weiweifu2-c@my.cityu.edu.hk
2) Meng Xu
Email: mxu247-c@my.cityu.edu.hk
3) Zihao Wen
Email: zihaowen2-c@my.cityu.edu.hk
4) Haifeng Guo
Email: haifenguo2-c@my.cityu.edu.hk
5) Jinpeng Chen
Email: jinpechen2-c@my.cityu.edu.hk
6) Yinglan Feng
Email: yinglfeng2-c@my.cityu.edu.hk
Course website: https://www.cs.cityu.edu.hk/~dapengwu/courses/GE2340f22
Meeting Time for Lectures
Monday, 3 pm - 5:50 pm
Meeting Room for Lectures
LT 5 (on Floor 4), AC-1 Building
Meeting Weeks for Tutorials
Tutorials will be given in Room 4412, AC-2 Building, in the 4th week through
the 11th week (i.e., from Sept. 19 to Nov. 11) for a total of 8 tutorials. There
are three sessions for the tutorials. The meeting times for tutorials are
- 17:00-17:50, Tuesday, instructors: Zihao Wen (lead), Haifeng Guo
- 18:00-18:50, Tuesday, instructors: Meng Xu (lead), Jinpeng Chen
- 18:00-18:50, Friday, instructors: Weiwei Fu (lead), Yinglan Feng
You only need to attend one session since the three sessions cover the same
teaching materials.

Course Policies
- During lecture, cell phones should be in a silent mode.
- No late submissions of your homework solution, and project report, are allowed
unless advance permission is granted by
the instructor.

Grading:
| Grades |
Percentage |
Due Dates |
| Weekly quiz |
10% |
In-class quiz |
| Reading report |
10% |
4pm, Oct. 10 |
| Term project |
40% |
4pm, Nov. 26 |
| Final exam |
40% |
Dec. 5--17 |
Class Project:
The class project will be done individually. A report is expected to be
generated by each student to document his/her research, critical comparison and
analysis, and his/her new ideas.
For details about
the project, please read here.
Suggested topics for projects are listed here.


Useful links
- Anaconda: Anaconda is the
leading open data science platform powered by Python.
- Theano:
Theano is a Python library that lets you to define, optimize, and evaluate
mathematical expressions, especially ones with multi-dimensional arrays (numpy.ndarray).
- TensorFlow:
TensorFlow is an open source software library for numerical computation using
data flow graphs. Nodes in the graph represent mathematical operations, while
the graph edges represent the multidimensional data arrays (tensors)
communicated between them. The flexible architecture allows you to deploy
computation to one or more CPUs or GPUs in a desktop, server, or mobile device
with a single API.
- Keras: Keras is a minimalist,
highly modular neural networks library, written in Python and capable of
running on top of either TensorFlow or Theano. It was developed with a focus
on enabling fast experimentation. Being able to go from idea to result with
the least possible delay is key to doing good research.
-
PyTorch: PyTorch is a deep learning
framework for fast, flexible experimentation.
-
A curated list of resources dedicated to
recurrent neural networks
-
Use the Keras platform to implement handwritten digit recognition, with a
multi-layer perceptron:
[link]
-
Source code in
PyTorch for handwritten digit recognition, using 2D convolutional neural networks
-
Source code in Python for TF-mRNN: a TensorFlow library for image captioning
-
Source code in Python for the following work on image captioning:
-
Image captioning:
-
Microsoft COCO datasets
- Visual Question Answering:
- Semantic Propositional Image Caption Evaluation (SPICE)
- Region-based Convolutional Neural Networks (R-CNN)
- References:
- Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN:
Towards real-time object detection with region proposal networks." In Advances
in neural information processing systems, pp. 91-99. 2015. [pdf]
- Dai, Jifeng, Yi Li, Kaiming He, and Jian Sun. "R-FCN: Object detection via
region-based fully convolutional networks." In Advances in neural information
processing systems, pp. 379-387. 2016. [pdf]
[source code]
- Huang, Jonathan, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara,
Alireza Fathi, Ian Fischer et al. "Speed/accuracy trade-offs for modern
convolutional object detectors." arXiv preprint arXiv:1611.10012 (2016). [pdf]
(E.g., for Inception V3, extract features from the “Mixed 6e” layer whose
stride size is 16 pixels. Feature maps are cropped and resized to 17x17.)
- Source codes:
- Source code in Python for end-to-end training of LSTM
- Bidirectional Encoder Representations from Transformers (BERT)
- Source code in Python for sequence-to-sequence learning (language translation,
chatbot)
-
Visual Storytelling Dataset (VIST)
- Visual storytelling algorithms:
- No Metrics Are Perfect: Adversarial REward Learning for Visual
Storytelling: source codes (TensorFlow)
-
Visual Genome is a dataset, a knowledge
base, an ongoing effort to connect structured image concepts to language.
-
MPII Movie & Description dataset for automatic video description, video
summary, video storytelling
-
Bidirectional recurrent neural networks (B-RNN):
- Graves, Alan, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech
recognition with deep bidirectional LSTM." IEEE Workshop on Automatic Speech
Recognition and Understanding (ASRU), 2013. [pdf]
- Deep reinforcement learning
- UCL Course on reinforcement learning: [ppt]
[video]
- References:
- Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing
atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602
(2013).
- Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel
Veness, Marc G. Bellemare, Alex Graves et al. "Human-level
control through deep reinforcement learning." Nature 518, no. 7540
(2015): 529-533. [source
code]
-
How to Study Reinforcement Learning
- Source codes:
- Implementation
of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow.
Exercises and Solutions to accompany Sutton's Book and David Silver's course.
[link]
- Generative Adversarial Network (GAN)
- References:
- Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative
adversarial nets." In Advances in neural information processing systems,
pp. 2672-2680. 2014.
- Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised
representation learning with deep convolutional generative adversarial
networks." arXiv preprint arXiv:1511.06434 (2015).
- Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein
GAN." arXiv preprint arXiv:1701.07875 (2017).
- Types of GAN
- Vanilla GAN
- Conditional GAN
- InfoGAN
- Wasserstein GAN
- Mode Regularized GAN
- Coupled GAN
- Auxiliary Classifier GAN
- Least Squares GAN
- Boundary Seeking GAN
- Energy Based GAN
- f-GAN
- Generative Adversarial
Parallelization
- DiscoGAN
- Adversarial Feature Learning
& Adversarially Learned Inference
- Boundary Equilibrium GAN
- Improved Training for
Wasserstein GAN
- DualGAN
- MAGAN: Margin Adaptation for
GAN
- Softmax GAN
- Source codes:
- A Tensorflow
Implementation of "Deep Convolutional Generative Adversarial Networks":
python code
- Collection of
generative models, e.g. GAN, VAE in Pytorch and Tensorflow:
python code
- Sequential Generative Adversarial Network (GAN)
- References:
- Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "SeqGAN:
Sequence Generative Adversarial Nets with Policy Gradient." In AAAI,
pp. 2852-2858. 2017.
- Mogren, Olof. "C-RNN-GAN:
Continuous recurrent neural networks with adversarial training." arXiv
preprint arXiv:1611.09904 (2016).
- Im, Daniel Jiwoong, Chris Dongjoo Kim, Hui Jiang, and Roland Memisevic. "Generating
images with recurrent adversarial networks." arXiv preprint
arXiv:1602.05110 (2016).
- Press, Ofir, Amir Bar, Ben Bogin, Jonathan Berant, and Lior Wolf. "Language
Generation with Recurrent Generative Adversarial Networks without Pre-training."
arXiv preprint arXiv:1706.01399 (2017).
- Source codes:
- Stanford NLP
Parser: A natural language parser is a program that works out the
grammatical structure of sentences.
- Performance metrics
for a natural language parser
- Precision and
recall
- mAP (mean
Average Precision) for Object Detection
- Question answering
- References:
- Source codes:
- Question answering datasets:
- The General Language Understanding Evaluation (GLUE)
benchmark is a collection of resources for training, evaluating, and analyzing
natural language understanding systems.
- Semantic
Textual Similarity (STS) benchmark evaluation dataset
- Automatic text understanding and reasoning:
-
NLTK sentiment analysis
tool
-
Opinion Lexicon (dictionary of sentiment words):
Positive and
Negative
-
Human activity recognition
-
HMDB: a large human motion database
- UCF101: Action
Recognition Data Set
-
Coronavirus
dataset
-
AI City Challenge
-
Batch Normalization and Weight Decay Notes
-
A powerful and flexible machine learning
platform for drug discovery
-
MATLAB Tutorial
-
MATLAB Central
-
Matlab Primer,
Matlab Manuals,
Image
Processing Toolbox
-
Matlab implementation of image/video compression algorithms
- Matrix Reference
Manual
- HIPR2: a WWW-based Image
Processing Teaching Materials with J
- Learning by simulations
- OpenCV
- OpenGL
- A Recipe for
Training Neural Networks (by Andrej Karpathy)
- Download the following
free (open source)
program to record video with screen capture:
http://www.nchsoftware.com/capture/index.html?gclid=CNadwsW6-6wCFSVjTAodbjzTSg
Free books
Software:
- Virtual Dub: VirtualDub
is a video capture/processing utility for 32-bit Windows platforms
(95/98/ME/NT4/2000/XP), licensed under the GNU General Public License (GPL).
- XnView:
is an efficient multimedia viewer, browser and converter.
- ImageJ: Read and write GIF,
JPEG, and ASCII. Read BMP, DICOM, and FITS. [Open Source, Public Domain]
- Open source for image processing tasks:
http://octave.sourceforge.net/doc/image.html
Related courses in other institutions:
JOURNALS
Elsevier
- Computer Vision and
Image Understanding
- Journal of Visual
Communication and Image Representation
- Data & Knowledge Engineering
- Image and Vision Computing
- Pattern Recognition
- Pattern Recognition Letters
IEEE
- IEEE Transactions on
Circuits and Systems for Video Technology
- IEEE Transactions on Multimedia
- IEEE Transactions on
Image Processing
- IEEE Transactions on
Medical Imaging
- IEEE Transactions on PAMI
Computer Vision
Public Domain Image Databases
CMU Database
