EEL 6562 -- Image Processing and Computer Vision

City University of Hong Kong

Department of Computer Science

Artificial Intelligence

Semester B, 2024/25

CS4486 is an undergraduate-level course for the field of Artificial Intelligence (AI). This course is designed to equip students with the knowledge and skills of problem solving using AI techniques. It is not about computer vision and natural language processing; instead, it is an entry-level course covering the problem-solving methods such as search and optimization, the logical systems with reasoning, and machine learning techniques.

Prerequisites

CS2310 Computer Programming or
CS2315 Computer Programming or
CS2334 Data Structures for Data Science or
CS2360 Java Programming

Textbook

There is no textbook for the course. All teaching materials will be from online sources.

The optional readings, unless explicitly specified, come from the book Artificial Intelligence: A Modern Approach, 3rd ed by Stuart Russell and Peter Norvig.

Instructor:

Dr. Dapeng Wu
Office: Y6321, AC-1 Building
Email: dapengwu@cityu.edu.hk

TA:

1) Hong Huang

Email: hohuang-c@my.cityu.edu.hk

2) Yongcan Luo

Email: yongcaluo2-c@my.cityu.edu.hk

3) Hongming Piao

Email: hpiao6-c@my.cityu.edu.hk

4) Tianli Shi

Email: tianlishi2-c@my.cityu.edu.hk

5) Hao Wang

Email: hwang728-c@my.cityu.edu.hk

6) Shuguang Wang

Email: sgwang6-c@my.cityu.edu.hk

7) Yun Wang

Email: ywang3875-c@my.cityu.edu.hk

8) Renwei Yang

Email: renweyang2-c@my.cityu.edu.hk

9) Guanyi Zhao

Email: guanyzhao3-c@my.cityu.edu.hk

10) Jiahao Zheng

Email: jhzheng4-c@my.cityu.edu.hk

Course website: https://www.cs.cityu.edu.hk/~dapengwu/courses/CS4486s25

Meeting Time for Lectures

Friday, 9 am - 11:50 am

Meeting Room for Lectures

LT 18 (on Floor 4), AC-1 Building

Meeting Weeks for Tutorials

Tutorials will be given in Room B4702, AC-1 Building, in the first week through the 10th week (i.e., from Jan. 17 to March 28) for a total of 10 tutorials; note that there is no class/tutorial on Jan. 31. There are two sessions for the tutorials. The meeting times for tutorials are

13:00-13:50, Friday, instructors: Hong Huang, Hongming Piao

14:00-14:50, Friday, instructors: Tianli Shi, Renwei Yang

You only need to attend one session since the two sessions cover the same teaching materials.

Course Policies

During lecture, cell phones should be in a silent mode.
No late submissions of your homework solution, and project report, are allowed unless advance permission is granted by the instructor.

Grading:

Grades	Percentage	Due Dates
Weekly quiz	10%	In-class quiz
Homework	20%	To be announced
Project	20%	To be announced
Final exam	50%	April 28--May 13

Class Project:

The class project will be done individually. Each student is expected to implement some AI technique to solve real-world problems such as sales prediction, birds classification, spam detection, music genre classification, skin cancer classification, and game. A report is expected to be written by each student to document his/her research.

The course calendar can be found here.

Useful links

Anaconda: Anaconda is the leading open data science platform powered by Python.
Theano: Theano is a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays (numpy.ndarray).
TensorFlow: TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.
Keras: Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
PyTorch: PyTorch is a deep learning framework for fast, flexible experimentation.
A curated list of resources dedicated to recurrent neural networks
Use the Keras platform to implement handwritten digit recognition, with a multi-layer perceptron: [link]
Source code in PyTorch for handwritten digit recognition, using 2D convolutional neural networks
Source code in Python for TF-mRNN: a TensorFlow library for image captioning
Source code in Python for the following work on image captioning:
- Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: A Neural Image Caption Generator, CVPR 2015
  - Implementation
Image captioning:
- Zhe Gan, et. al, Semantic Compositional Networks for Visual Captioning, CVPR 2017
  - Implementation Source code in Python (Theano)
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: source codes (Caffe) and source codes (PyTorch)
Microsoft COCO datasets
Visual Question Answering:
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: source codes (Caffe) and VQA source code (PyTorch)
Semantic Propositional Image Caption Evaluation (SPICE)
- Source code in JAVA to calculate SPICE
Region-based Convolutional Neural Networks (R-CNN)
- References:
  - Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in neural information processing systems, pp. 91-99. 2015. [pdf]
  - Dai, Jifeng, Yi Li, Kaiming He, and Jian Sun. "R-FCN: Object detection via region-based fully convolutional networks." In Advances in neural information processing systems, pp. 379-387. 2016. [pdf] [source code]
  - Huang, Jonathan, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer et al. "Speed/accuracy trade-offs for modern convolutional object detectors." arXiv preprint arXiv:1611.10012 (2016). [pdf] (E.g., for Inception V3, extract features from the “Mixed 6e” layer whose stride size is 16 pixels. Feature maps are cropped and resized to 17x17.)
- Source codes:
  - A Faster Pytorch Implementation of Faster R-CNN (PyTorch)
  - Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: source codes (Caffe)
Source code in Python for end-to-end training of LSTM
- Implementation
Bidirectional Encoder Representations from Transformers (BERT)
- Implementation in TensorFlow
- Implementation in PyTorch
Source code in Python for sequence-to-sequence learning (language translation, chatbot)
- TensorFlow seq2seq library
- Implementation 1 on Tensorflow with separable encoder and decoder
- Implementation 2 on Keras
Visual Storytelling Dataset (VIST)
- Visual storytelling algorithms:
  - No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling: source codes (TensorFlow)
Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language.
MPII Movie & Description dataset for automatic video description, video summary, video storytelling
Bidirectional recurrent neural networks (B-RNN):
- Graves, Alan, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM." IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2013. [pdf]
Deep reinforcement learning
- UCL Course on reinforcement learning: [ppt] [video]
- References:
  - Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
  - Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533. [source code]
  - How to Study Reinforcement Learning
- Source codes:
  - Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course. [link]
Generative Adversarial Network (GAN)
- References:
  - Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.
  - Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
  - Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein GAN." arXiv preprint arXiv:1701.07875 (2017).
- Types of GAN
- Source codes:
  - A Tensorflow Implementation of "Deep Convolutional Generative Adversarial Networks": python code
  - Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow: python code
Sequential Generative Adversarial Network (GAN)
- References:
  - Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." In AAAI, pp. 2852-2858. 2017.
  - Mogren, Olof. "C-RNN-GAN: Continuous recurrent neural networks with adversarial training." arXiv preprint arXiv:1611.09904 (2016).
  - Im, Daniel Jiwoong, Chris Dongjoo Kim, Hui Jiang, and Roland Memisevic. "Generating images with recurrent adversarial networks." arXiv preprint arXiv:1602.05110 (2016).
  - Press, Ofir, Amir Bar, Ben Bogin, Jonathan Berant, and Lior Wolf. "Language Generation with Recurrent Generative Adversarial Networks without Pre-training." arXiv preprint arXiv:1706.01399 (2017).
- Source codes:
  - Implementation of C-RNN-GAN
  - Tensorflow Implementation of GAN modeling for sequential data
Stanford NLP Parser: A natural language parser is a program that works out the grammatical structure of sentences.
Performance metrics for a natural language parser
Precision and recall
mAP (mean Average Precision) for Object Detection
Question answering
- References:
  - Seo, Minjoon, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. "Bidirectional attention flow for machine comprehension." arXiv preprint arXiv:1611.01603 (2016).
- Source codes:
  - Bi-Directional Attention Flow (BIDAF)
- Question answering datasets:
  - Stanford Question Answering Dataset (SQuAD)
  - NewsQA
  - MS MARCO
The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
Semantic Textual Similarity (STS) benchmark evaluation dataset
Automatic text understanding and reasoning:
- Facebook bAbI project
- Facebook dataset
- Python code
NLTK sentiment analysis tool
Opinion Lexicon (dictionary of sentiment words): Positive and Negative
Human activity recognition
- HMDB: a large human motion database
  - Action recognition algorithms
- UCF101: Action Recognition Data Set
  - Activity recognition algorithms
Coronavirus dataset
AI City Challenge
Batch Normalization and Weight Decay Notes
A powerful and flexible machine learning platform for drug discovery
MATLAB Tutorial
MATLAB Central
Matlab Primer, Matlab Manuals, Image Processing Toolbox
Matlab implementation of image/video compression algorithms
Matrix Reference Manual
HIPR2: a WWW-based Image Processing Teaching Materials with J
Learning by simulations
OpenCV
OpenGL
A Recipe for Training Neural Networks (by Andrej Karpathy)
Download the following free (open source) program to record video with screen capture: http://www.nchsoftware.com/capture/index.html?gclid=CNadwsW6-6wCFSVjTAodbjzTSg

Free books

Introduction to Matarix Algebra (free book by Autar K Kaw, Professor, University of South Florida).
Mathematics for Machine Learning
Top 13 (free) must read machine leaning books for beginners
100+ free machine learning books

Software:

Virtual Dub: VirtualDub is a video capture/processing utility for 32-bit Windows platforms (95/98/ME/NT4/2000/XP), licensed under the GNU General Public License (GPL).
XnView: is an efficient multimedia viewer, browser and converter.
ImageJ: Read and write GIF, JPEG, and ASCII. Read BMP, DICOM, and FITS. [Open Source, Public Domain]
Open source for image processing tasks: http://octave.sourceforge.net/doc/image.html

Related courses in other institutions:

Stanford University CS221: Artificial Intelligence: Principles and Techniques: [video]
Stanford University CS224n: Natural Language Processing with Deep Learning: [video]
Stanford University CS229 - Machine Learning: notes and video can be found on this web
Stanford University CS230 Deep Learning: [ppt] [video]
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition: [ppt] [video]
UCL Course on reinforcement learning: [ppt] [video]
RWTH Aachen University Implementation of Heuristic Algorithms for Board Games
Mila - Quebec AI Institute Introduction to Causal Inference
UC Berkeley Foundations of Deep Reinforcement Learning
DeepMind Reinforcement Learning Lecture Series

JOURNALS

Elsevier

Computer Vision and Image Understanding
Journal of Visual Communication and Image Representation
Data & Knowledge Engineering
Image and Vision Computing
Pattern Recognition
Pattern Recognition Letters

IEEE

IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Multimedia
IEEE Transactions on Image Processing
IEEE Transactions on Medical Imaging
IEEE Transactions on PAMI

Computer Vision

Computer Vision Homepage at CMU
Annotated Computer Vision Bibliography from USC IRIS
CVonline: The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision
3-D for Everyone
Red-blue glasses or anaglyph for 3D viewing: http://www.best3dglasses.com/anaglyph.html
Shutter glasses for 3D viewing: http://www.stereo3d.com/shutter.htm
3D cameras: http://www.ptgrey.com/index.asp
3D photos at http://www.jessemazer.com/3Dphotos.html
3D video sequences can be downloaded at: http://research.microsoft.com/vision/InteractiveVisualMediaGroup/3DVideoDownload/

Public Domain Image Databases

CMU Database

CS 4486