EEL 6562 -- Image Processing and Computer Vision

City University of Hong Kong

Department of Computer Science

Artificial Intelligence -- Past, Present, and Future

Semester A, 2022/23

This AI course is suitable for both technical and non-technical students alike. It aims to firstly provide an overall view of what is AI, its developments over the past decades, its current trends, and a look at potential future directions. It will cover impact of AI to society and business. Through case studies, students gain a better insight on different AI technologies and how they can be used to address a wide range of social and business needs. The course will broaden students' understanding of current state-of-the-art in AI and future trends, as well as how various needs of different industries can be addressed through innovative use of AI. The second objective of this course is to help students become creative innovators in applying "AI first" concepts to solving real-world problems through project-based work. This course will be useful for students from any discipline and will give insights to the value of AI across industries from a global point of view as well as issues related to their ethical use. To make this course as widely accessible to as many people as possible from any background, no programming will be required and no prior programming skills are assumed.

Textbook

There is no textbook for the course. All teaching materials will be from online sources.

Instructor:

Dr. Dapeng Wu
Office: Y6307, AC-1 Building
Email: dapengwu@cityu.edu.hk

TA:

1) Weiwei Fu
Email: weiweifu2-c@my.cityu.edu.hk

2) Meng Xu
Email: mxu247-c@my.cityu.edu.hk

3) Zihao Wen
Email: zihaowen2-c@my.cityu.edu.hk

4) Haifeng Guo
Email: haifenguo2-c@my.cityu.edu.hk

5) Jinpeng Chen
Email: jinpechen2-c@my.cityu.edu.hk

6) Yinglan Feng
Email: yinglfeng2-c@my.cityu.edu.hk

Course website: https://www.cs.cityu.edu.hk/~dapengwu/courses/GE2340f22

Meeting Time for Lectures

Monday, 3 pm - 5:50 pm

Meeting Room for Lectures

LT 5 (on Floor 4), AC-1 Building

Meeting Weeks for Tutorials

Tutorials will be given in Room 4412, AC-2 Building, in the 4th week through the 11th week (i.e., from Sept. 19 to Nov. 11) for a total of 8 tutorials. There are three sessions for the tutorials. The meeting times for tutorials are

17:00-17:50, Tuesday, instructors: Zihao Wen (lead), Haifeng Guo

18:00-18:50, Tuesday, instructors: Meng Xu (lead), Jinpeng Chen

18:00-18:50, Friday, instructors: Weiwei Fu (lead), Yinglan Feng

You only need to attend one session since the three sessions cover the same teaching materials.

Course Policies

During lecture, cell phones should be in a silent mode.
No late submissions of your homework solution, and project report, are allowed unless advance permission is granted by the instructor.

Grading:

Grades	Percentage	Due Dates
Weekly quiz	10%	In-class quiz
Reading report	10%	4pm, Oct. 10
Term project	40%	4pm, Nov. 26
Final exam	40%	Dec. 5--17

Class Project:

The class project will be done individually. A report is expected to be generated by each student to document his/her research, critical comparison and analysis, and his/her new ideas. For details about the project, please read here.

Suggested topics for projects are listed here.

Useful links

Anaconda: Anaconda is the leading open data science platform powered by Python.
Theano: Theano is a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays (numpy.ndarray).
TensorFlow: TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.
Keras: Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
PyTorch: PyTorch is a deep learning framework for fast, flexible experimentation.
A curated list of resources dedicated to recurrent neural networks
Use the Keras platform to implement handwritten digit recognition, with a multi-layer perceptron: [link]
Source code in PyTorch for handwritten digit recognition, using 2D convolutional neural networks
Source code in Python for TF-mRNN: a TensorFlow library for image captioning
Source code in Python for the following work on image captioning:
- Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: A Neural Image Caption Generator, CVPR 2015
  - Implementation
Image captioning:
- Zhe Gan, et. al, Semantic Compositional Networks for Visual Captioning, CVPR 2017
  - Implementation Source code in Python (Theano)
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: source codes (Caffe) and source codes (PyTorch)
Microsoft COCO datasets
Visual Question Answering:
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: source codes (Caffe) and VQA source code (PyTorch)
Semantic Propositional Image Caption Evaluation (SPICE)
- Source code in JAVA to calculate SPICE
Region-based Convolutional Neural Networks (R-CNN)
- References:
  - Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in neural information processing systems, pp. 91-99. 2015. [pdf]
  - Dai, Jifeng, Yi Li, Kaiming He, and Jian Sun. "R-FCN: Object detection via region-based fully convolutional networks." In Advances in neural information processing systems, pp. 379-387. 2016. [pdf] [source code]
  - Huang, Jonathan, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer et al. "Speed/accuracy trade-offs for modern convolutional object detectors." arXiv preprint arXiv:1611.10012 (2016). [pdf] (E.g., for Inception V3, extract features from the “Mixed 6e” layer whose stride size is 16 pixels. Feature maps are cropped and resized to 17x17.)
- Source codes:
  - A Faster Pytorch Implementation of Faster R-CNN (PyTorch)
  - Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: source codes (Caffe)
Source code in Python for end-to-end training of LSTM
- Implementation
Bidirectional Encoder Representations from Transformers (BERT)
- Implementation in TensorFlow
- Implementation in PyTorch
Source code in Python for sequence-to-sequence learning (language translation, chatbot)
- TensorFlow seq2seq library
- Implementation 1 on Tensorflow with separable encoder and decoder
- Implementation 2 on Keras
Visual Storytelling Dataset (VIST)
- Visual storytelling algorithms:
  - No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling: source codes (TensorFlow)
Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language.
MPII Movie & Description dataset for automatic video description, video summary, video storytelling
Bidirectional recurrent neural networks (B-RNN):
- Graves, Alan, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM." IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2013. [pdf]
Deep reinforcement learning
- UCL Course on reinforcement learning: [ppt] [video]
- References:
  - Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
  - Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533. [source code]
  - How to Study Reinforcement Learning
- Source codes:
  - Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course. [link]
Generative Adversarial Network (GAN)
- References:
  - Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.
  - Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
  - Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein GAN." arXiv preprint arXiv:1701.07875 (2017).
- Types of GAN
- Source codes:
  - A Tensorflow Implementation of "Deep Convolutional Generative Adversarial Networks": python code
  - Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow: python code
Sequential Generative Adversarial Network (GAN)
- References:
  - Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." In AAAI, pp. 2852-2858. 2017.
  - Mogren, Olof. "C-RNN-GAN: Continuous recurrent neural networks with adversarial training." arXiv preprint arXiv:1611.09904 (2016).
  - Im, Daniel Jiwoong, Chris Dongjoo Kim, Hui Jiang, and Roland Memisevic. "Generating images with recurrent adversarial networks." arXiv preprint arXiv:1602.05110 (2016).
  - Press, Ofir, Amir Bar, Ben Bogin, Jonathan Berant, and Lior Wolf. "Language Generation with Recurrent Generative Adversarial Networks without Pre-training." arXiv preprint arXiv:1706.01399 (2017).
- Source codes:
  - Implementation of C-RNN-GAN
  - Tensorflow Implementation of GAN modeling for sequential data
Stanford NLP Parser: A natural language parser is a program that works out the grammatical structure of sentences.
Performance metrics for a natural language parser
Precision and recall
mAP (mean Average Precision) for Object Detection
Question answering
- References:
  - Seo, Minjoon, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. "Bidirectional attention flow for machine comprehension." arXiv preprint arXiv:1611.01603 (2016).
- Source codes:
  - Bi-Directional Attention Flow (BIDAF)
- Question answering datasets:
  - Stanford Question Answering Dataset (SQuAD)
  - NewsQA
  - MS MARCO
The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
Semantic Textual Similarity (STS) benchmark evaluation dataset
Automatic text understanding and reasoning:
- Facebook bAbI project
- Facebook dataset
- Python code
NLTK sentiment analysis tool
Opinion Lexicon (dictionary of sentiment words): Positive and Negative
Human activity recognition
- HMDB: a large human motion database
  - Action recognition algorithms
- UCF101: Action Recognition Data Set
  - Activity recognition algorithms
Coronavirus dataset
AI City Challenge
Batch Normalization and Weight Decay Notes
A powerful and flexible machine learning platform for drug discovery
MATLAB Tutorial
MATLAB Central
Matlab Primer, Matlab Manuals, Image Processing Toolbox
Matlab implementation of image/video compression algorithms
Matrix Reference Manual
HIPR2: a WWW-based Image Processing Teaching Materials with J
Learning by simulations
OpenCV
OpenGL
A Recipe for Training Neural Networks (by Andrej Karpathy)
Download the following free (open source) program to record video with screen capture: http://www.nchsoftware.com/capture/index.html?gclid=CNadwsW6-6wCFSVjTAodbjzTSg

Free books

Introduction to Matarix Algebra (free book by Autar K Kaw, Professor, University of South Florida).
Mathematics for Machine Learning
Top 13 (free) must read machine leaning books for beginners
100+ free machine learning books

Software:

Virtual Dub: VirtualDub is a video capture/processing utility for 32-bit Windows platforms (95/98/ME/NT4/2000/XP), licensed under the GNU General Public License (GPL).
XnView: is an efficient multimedia viewer, browser and converter.
ImageJ: Read and write GIF, JPEG, and ASCII. Read BMP, DICOM, and FITS. [Open Source, Public Domain]
Open source for image processing tasks: http://octave.sourceforge.net/doc/image.html

Related courses in other institutions:

Stanford University CS221: Artificial Intelligence: Principles and Techniques: [video]
Stanford University CS224n: Natural Language Processing with Deep Learning: [video]
Stanford University CS229 - Machine Learning: notes and video can be found on this web
Stanford University CS230 Deep Learning: [ppt] [video]
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition: [ppt] [video]
UCL Course on reinforcement learning: [ppt] [video]
RWTH Aachen University Implementation of Heuristic Algorithms for Board Games
Mila - Quebec AI Institute Introduction to Causal Inference
UC Berkeley Foundations of Deep Reinforcement Learning
DeepMind Reinforcement Learning Lecture Series

JOURNALS

Elsevier

Computer Vision and Image Understanding
Journal of Visual Communication and Image Representation
Data & Knowledge Engineering
Image and Vision Computing
Pattern Recognition
Pattern Recognition Letters

IEEE

IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Multimedia
IEEE Transactions on Image Processing
IEEE Transactions on Medical Imaging
IEEE Transactions on PAMI

Computer Vision

Computer Vision Homepage at CMU
Annotated Computer Vision Bibliography from USC IRIS
CVonline: The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision
3-D for Everyone
Red-blue glasses or anaglyph for 3D viewing: http://www.best3dglasses.com/anaglyph.html
Shutter glasses for 3D viewing: http://www.stereo3d.com/shutter.htm
3D cameras: http://www.ptgrey.com/index.asp
3D photos at http://www.jessemazer.com/3Dphotos.html
3D video sequences can be downloaded at: http://research.microsoft.com/vision/InteractiveVisualMediaGroup/3DVideoDownload/

Public Domain Image Databases

CMU Database

GE 2340