BSc in
Computer Science

Final Year Project Showcase

Final Year Project Showcase


Year 2020-2021

Academic and Formal Writing Style Rewriter

Subject Areas

Natural Language Processing; Natural Language Generation; Transformer Model


Objectives

To rewrite informal sentences in a formal style with academic writing features; To generate a new corpus GYAFC-academic for the above task;

Abstract

The academic writing style taught in colleges and universities aims to assist scholars and students in communicating precisely. On the other hand, the formal writing style has wider application scenarios in business and industry. This project proposed a new task to rewrite informal sentences in a formal style with academic writing features. To finish this task, a new corpus GYAFC-academic dataset is generated and utilized in the training process. Through using transformer model and warm-starting mechanisms, the proposed models perform well in style transfer accuracy and outperform the benchmark models by a significant margin in terms of grammar accuracy.

Data Valuation in Machine Learning and Federated Learning

Subject Areas

Federated Learning; Incentive Mechanism; Data Valuation


Objectives

To evaluate the quality of local clients in the context of federated learning ; To achieve efficient data valuation-based incentive mechanisms

Abstract

Federated learning is a promising framework to collect the dispersed data and train a collaborative machine learning model. Incentive mechanisms are thus introduced to motivate clients to contribute data in the context of federated learning. To facilitate these mechanisms, data valuation is a state-of-the-art solution to measure clients' data quality for the payoff fairly. However, it suffers from high overheads of computation and communication. In this project, a round-based data valuation (RDV) approach is proposed to estimate data quality with efficiency. Besides, it helps to train better-performing models.

A Game Generator for Sliding Puzzles

Subject Areas

Intelligent System; Game Generator; Optimal Algorithms


Objectives

To propose a pivotal algorithm for generating the corresponding game codes; To develop a game framework for defining game logic of several sliding puzzles; To generate an image interpreter for processing the image input.

Abstract

A sliding puzzle, or sliding block game, is a very interesting and challenging game. It requires players to move one block horizontally or vertically without overlapping or crossing the game board in each step. The main target is to use as few steps as possible to reach an end configuration. However, little work pays attention to developing an intelligent and compatible system that can automatically generate a series of sliding puzzles. In this project, a novel game generator for sliding puzzles is designed and implemented, producing different types of complex sliding puzzles with optimal solutions. Overall, plenty of methods and techniques are utilized, including the multi-source Breadth-First Search, fast hash operations, and image processing. As a result, a powerful game generator is achieved for generating three kinds of complex sliding puzzles with optimal solutions, i.e., Kltoski, 15-puzzle, and Sokoban. Besides, a complete search on Klotski is successfully carried out, generating the most complicated Klotski puzzle games. It allows users to design and produce diverse sliding puzzles by processing and reading information from pictorial or text inputs. It is also scalable to create many other categories of sliding puzzles automatically.

Visualization for Spatial Transcriptomics Data

Subject Areas

Spatial Transcriptomics; Bioinformatics Visualization; Single-Cell Studies; Web-Based Visualization


Objectives

To serve as a novel example of web-base visualization applications based on the language TypeScript; To provide substantial analysis power and better flexibility for observing and analyzing spatial transcriptomics data and become a good aid for genomic research.

Abstract

Spatial Transcriptomics is a series of novel methods that enable transcriptomes' quantitative spatial analyses in individual tissue sections. Although there exist several tools and packages now for spatial transcriptomics data, a platform that can have functionalities of better flexibility is in demand to satisfy the analytic needs of biological research. This project develops a novel online tool to display and examine spatial transcriptomic data. It creates comprehensive modules for the interactions and customizations of spatial transcriptomic data visualization. The visualization modules built include a correlation plot, 2-D and 3-D embedding maps, a U-map, a correlation plot, a violin plot, and a deconvolution plot. the actual visualization power of these modules in existing transcriptomics research projects. It also features visualizations of ten datasets as an output of this project, including tissue slices from distinctive organs such as the human brain or the mouse kidney.

Application of Machine Learning to Classify Mobile App Reviews

Subject Areas

Machine Learning; Text Analysis; Natural Language Processing; Sentiment Analysis


Objectives

To classify app reviews automatically; To improve the progress of the software maintenance and evolution for app developers

Abstract

App stores allow users to download and buy software apps, and share feedback on installed apps with star ratings and text comments. Based on app reviews, App developers can improve or maintain the apps by bug fixing, feature enhancement, and adding new functions. However, the vast number of user reviews with diversifying quality, and mixed sentiments in a review significantly affect the progress of the software maintenance and evolution done by developers. In this project, an automated approach is proposed to classify app reviews into four pre-defined categories, which helps developers maintain and evolve apps. Different machine learning algorithms are trained using different features from three extraction techniques: Text Analysis, Natural Language Processing, and Sentiment Analysis. After comparisons among all ML algorithms, it shows that the combined use of the feature extraction techniques achieves the most outstanding results (precision of 74% and a recall of 72%) with the Logistic Regression.

An AI Rope Skipping Coaching, Training Data Recording, and Social Sharing for Normal People and Sports Players

Subject Areas

Artificial Intelligence; Mobile Application Development


Objectives

To develop a mobile application for counting a sport called Rope skipping; To encourage people to exercise regularly

Abstract

Recently, people pay more attention to weight and sub-optimal health issues. To address these issues, sports are effective methods. Due to the limited home space, Rope skipping that requires a small space and can be done individually is a good choice for Hong Kong people to play at home. Although there are many existing tools for evaluating sports performance like pedometer applications for monitoring running activities, there is no comprehensive tool for monitoring rope skipping training in the market. This project aims to develop a mobile application to provide auto Rope skipping counting and sports data recording functions for normal people, sports players, sports trainers, and judges. Also, to promote this sport and encourage people to build up regular sport behavior, this application provides a social media sharing function for users to share their training records and motive each other. The project introduces a multi-tracking point, markerless, single camera, mobile application for capturing the jumping action and calculating Jumping and Tripping.

Artificial Intelligence for Classical Music composition in different eras

Subject Areas

Artificial Intelligence; AI Music Generation; Deep Learning


Objectives

To generate classical music for composers, musicians, or even non-specialists without prior knowledge of classical music theories and backgrounds according to their favorite musical eras.

Abstract

Artificial Intelligence could bring music composition to another level with limitless possibilities as an assistant for human musicians or an AI musician itself. Living in a digital era, classical music plays a dominant role in commercial films, movie trailers, game soundtracks, and more. However, there are no existing works that generate classical music in different eras. To fill this gap, this project proposes an AI music generator for classical music. So that it would be possible for composers, musicians, or even non-specialists without prior knowledge of classical music theories and backgrounds could quickly compose classical music according to their favorite musical eras for many practical purposes. It uses generative models, i.e., Bi-LSTM and CNNGAN to compose classical music for some particular classical music genres and evaluate their performance respectively and collectively.

Year 2019-2020

Object Imaging on Mobile Devices Utilizing Acoustic Signals

Subject Areas

Data Visualization; Machine Learning; Systems Design; Ubiquitous and Mobile Computing


Objectives

To focuses on a solution of object imaging on mobile devices via acoustic signals

Abstract

Object imaging by utilizing different kinds of signals is already not a recent topic, including optical signals (visible light), radio frequency signals [11, 12], and acoustic signals (human audible [3] and inaudible). In this project, a solution in the form of mobile application on iOS using 20 Hz-20 kHz acoustic signals will be designed, with machine learning and ultrasonic signal extension in the late stage. For the deliverable of this project, apart from the imaging system, an attack model and a gesture classification system are designed.

Finger Motion Tracking Using Acoustic Signals

Subject Areas

IoT; Machine Learning; Mobile Application


Objectives

To implement a prototype for finger motion tracking in real time; To find the factors that could affect the accuracy of finger motion tracking; To improve the performance of finger motion tracking

Abstract

Nowadays more and more people use and hold smart devices such as smart phones and these devices have become a part of people. However, in addition to using voice to control these devices, the more situations are interacting with them directly through touching. Therefore, in some cases, people find it more difficult to use these smart devices. For example, when the phone is occupied by putting it in the pocket, the user cannot touch the mobile phone directly, which means he cannot do the interaction with the phone such as answering the call or adjusting the volume. In this project, by tracking the user's finger motion to control the smart device, it can provide another interface to send the command as input to the computer.

A Comprehensive Learning Framework for Sampling-based Motion Planning in Autonomous Driving

Subject Areas

Algorithms; Artificial Intelligence; Data Analysis; Data Mining


Objectives

To give a definition to an optimistic route and prepare a dataset containing those routes; To involve deep learning into points sampling process to speed up the searching of optimistic route; To involve user-experience and safety into route searching process to make pruning cut and speed up the process

Abstract

Route planning problem has been a classic problem in the automatic driving area. With the development of computer vision and sensing techniques, automatic vehicles have gained the capability of capturing rich environment information for drive. However, how to utilize both internal and external information to plan an optimistic route is still not that satisfied. To produce an optimistic path, we are supposed to take user-experience, efficiency, accuracy and safety into consideration. Current state-of-the-art algorithms basically pay more attention to efficiency and accuracy, but they tend to ignore the importance of user-experience and safety. For instance, it is not desired that the vehicle drives at high speed in urban areas and with high angular velocity when road condition is bad (e.g. rainy days). Besides, the points used to construct the road are randomly sampled in the current algorithm. It may be time-consuming when some noisy points are selected and used to extend the path. We truly believe that with a learning algorithm involved in the sampling process, we are able to produce more high quality sampling points and thus eliminate the bad effect of useless points. Thus, I would like to focus on an automatic vehicle driving system that can take all these factors into account to provide an optimistic route planning algorithm.

Prediction Model for Stock Market

Subject Areas

Data Analysis; Data Science; Machine Learning


Objectives

To provide a probabilistic measure on whether the next day's stock price increases or decreases by comparing it with today's closing price.

Abstract

Investors are optimizing their algorithm and model on predicting stock movement since it is hard to estimate the future market dynamic, which is affected by different factors. Some examples are the foreign market news, the effects of correlated stocks, and government politics. Therefore, investors are now using different approaches, like fundamental analysis and technical analysis, with various sources of data. Therefore, this paper attempts to use another approach to predicting the stock price movement. Instead of telling how exactly the stock price increases or decreases, this paper aims to provide a probabilistic measure on whether the next day stock price increases or decreases by comparing with today's closing price. Investors can make a better buy/sell decision based on the score according to the risk that they can bear or the risk diversification strategy on their financial portfolio. Five stocks in the Hong Kong sector are selected to be the target stocks of the prediction.

Predictive Analysis on Football Match Result

Subject Areas

Data Mining; Data Science


Objectives

To adapt different predictive models to predict the match result; To figure out the amount of uncertainty reduced and re-examine the statement - football is unpredictable.

Abstract

Football has become one of the popular Sports in the World. Nowadays, this sports game has further developed rapidly with over billions of fans or audiences in the world. The big five football leagues in the world - Premier League, LaLiga, Bundesliga, Serie A and Ligue1 have many football fans concerned about how their supporting teams performed in the world. In this project, a predictive model is built to predict the ranking of different teams in the mentioned league of the coming season for the five leagues mentioned.

Contextual Learning in Recommender Systems

Subject Areas

Algorithms; Data Science; Machine Learning; Theoretical Analysis


Objectives

To provide a suitable insurance plan to user; To avoid high charging fee being changed by agents/sales

Abstract

Recommender systems become a major component in almost every Internet system nowadays, such as Taobao, eBay, Amazon, TikTok. In this project, we will study recent advances in recommender systems. The project will focus on contextual recommendations. Here contextual information refers to various situational information, such as time, location, browsing history, that can influence user preference for items. The insurance recommendation would be mainly focused. Based on the user's personal information such as age, height, weight, BMI, habit, digital footprint, income and so on to recommend the most suitable insurance to the user.

Algorithms for data visualization

Subject Areas

Algorithms Design & Analysis; Bioinformatics; Data Visualization


Objectives

Design the algorithms to represent the data to different tree structure; Optimize the algorithms to improve the performance and layout

Abstract

Although nowadays many frameworks do the data visualization, there is no framework for people who want to customize their data into the tree structure. Therefore, in this project, I am going to design the generic, compatible and optimized algorithms in order to represent the data in different tree structures.

Deciphering Bulk Tissue Cell Type Proportions with a Deep Learning

Subject Areas

Algorithms; Bioinformatics


Objectives

Given the mutational data, processes caused the cancer (mutational signatures) and their corresponding exposure can be extracted. The speed or accuracy can be improved based on the current solution.

Abstract

All cancers are caused by somatic mutations. A software tool will be developed to explain what processes may cause the somatic mutations. That will help to find potential therapy for cancer patients.

Year 2018-2019

Awari game in Facebook Chatbot

Subject Areas

Mobile Application Development


Objectives

To develop an AI for each user, whom can educate and train the AI to battle with other users. Users can train their own AI by playing games. In this application, there will be three different games, which are applied data science concepts.

Abstract

Computer Science Challenge (CSC) is an eSport-like game tournament for secondary school students with an interest in computer science. It is an opportunity for students to test their ability and knowledge in different areas of computer science. Students learn computer programming and also play to learn computer science and mathematics. This project aims to develop innovative gamification software for people (children and adults) to learn mathematics while playing games. Moreover, it includes the front-end of the learning engine to create mobile app and a recommendation engine for learning through data science.

Faster Video Super-Resolution System

Subject Areas

Computer Networks; Computer Vision; Machine Learning


Objectives

To build a faster video super-resolution system that can make inference in real-time with normal GPUs.

Abstract

Deep neural network(DNN) has shown its superior performance on image super-resolution(SR) task in the past few years. The SR technique can be applied to content delivery network to achieve lower bandwidth requirement for high-resolution videos. Low-resolution videos will be sent to the clients and processed locally using DNN to generate corresponding high-resolution videos. However, DNN is computationally expensive compared with other SR approaches. And that makes it hard for some client devices to achieve realtime video super-resolution(usually 30 fps). In order to achieve both good visual quality and fast inference speed, we choose to utilize the correlation between adjacent frames. Instead of doing SR frame by frame, we apply SR to a subset of frames in each group of picture(GOP) only and transfer the SR result to other frames. We can manually choose the transfer ratio. In this way, we can have a trade-off between visual quality and inference speed.

Peer-to-peer Mobile Payment

Subject Areas

Mobile Application Development; Security


Objectives

Payer and payee can conduct a payment transaction using their mobile phones securely.

Abstract

With the rapid growth of popularity and customer acceptance, more merchants are willing to accept mobile payment at their stores, allowing users to make contactless commerce transaction at a point-of-sale (POS) terminal, a.k.a. a Business-to-Customer (B2C) mobile payment. While these payment applications are broadly accepted in today's commerce environment, most of the mobile payment applications require a POS terminal to work with. However, for many small-scale businesses like small restaurants, boutiques, food stalls, tuck shops, or pop-up stores, the cost of setting up a mobile payment available POS terminal may make them hesitate. Thus, this project focuses on a newly designed gesture-based time-based-one-time-pad generation to establish the OTP on both devices in an Out-of-band channel, as a security measure to guard against STLS (QR code token sniffing) attack, achieving a secure QR-code-based peer-to-peer(P2P) mobile payment system, which can also be used by merchants to receive payment only using a mobile phone, as an integration of B2C and P2P payment.

Secure mobile P2P payment system using dynamic color code

Subject Areas

Mobile Application Development; Security


Objectives

To develop a mobile payment system where the payer and payee can conduct a payment transaction using their mobile phones.

Abstract

Proximity-based mobile payment systems have been dominant in the past decade. However, with the increasing popularity, comes with an increased number of reported property loss due to those systems vulnerabilities and poor compliance to security protocol. This project gives out a solution by integrating an original visual OOB (Out-of-band) channel using dynamic color codes that address some of the biggest issues found in modern leading mobile payment systems. The security protocol analysis and use of mechanisms further ensure the security of this system, in hopes of facilitating the generalization of mobile P2P payme.

Navigation System for Visually Impaired

Subject Areas

Computer Vision; Location-Based Service; Machine Learning


Objectives

To identify the nearest traffic light and determine its signal; To interact with iBeacon to locate indoor location; To guide path direction detection

Abstract

In Hong Kong, there exist 174,800 people who are blind. They need to use a white cane and follow the tactile guide path for guiding their travel. When they arrive some complex traffic intersection, they may confuse on the traffic light signal sound. Seeing eye dog of course can solve this problem but they may not be a good solution for them because it has the following problems: Training require a high cost and long time.Not everyone can fulfill the requirement for application.End user still lost when they explore new location.Also, blind people cannot obtain different location / shop latest information. So that, it is very inconvenient for their daily life. In order to solve the above problems, Uto should develop and serve as a next-generation navigation blind person"es tool.

Large Graph Mining: Subgraph Isomorphism

Subject Areas

Algorithms; Big Data; Graph Theory; Parallel Processing


Objectives

To apply graph theory and create an effective graph analytics software to solve real-life problems with AI (cyber security, international human trafficking or microbiome).

Abstract

In this era of big data, data size grows in an unpredictable rate with the growth of Internet of Things (IoT) in recent years. Data becomes more difficult to manage and data leakage turns into common risks and mistakes. Thus, more companies are more aware of the necessity for cyber security every day. To detect suspicious activities among data, one of the most common strategies is to identify internet communities. In graph theory, a clique is a meaningful community structure and it is a fundamental concept of graph constructions. In this project, we introduce a novel joint hierarchical clustering and parallel counting algorithm that can carry out high performance large-scale data-intensive computing to count the number of cliques with different sizes in large graphs. Since clique decision problem is an NP-complete problem, we initially design the algorithm based on computing the exact number of triangles, then extend and modify it to achieve our ultimate goal. The algorithm consists of three major steps, pruning, hierarchical clustering and parallel counting. It allows scalable software framework, MapReduce, to calculate the number of cliques inside each cluster as well as those straddling between clusters in parallel. We characterize the performance of the algorithm mathematically, and evaluate its performance using different representative graphs including random graphs and social networks to demonstrate its computational efficiency over other state-of-the-art techniques.

Year 2017-2018

Data Deduplication for general user in Application layer

Subject Areas

Algorithms; Content-based Video / Audio / Image Indexing; Cryptography; Data Compression; Mobile Application Development; Network Security


Objectives

To implement different deduplication technologies; To develop a mobile application which is a cloud storage client for adopting deduplication technology; To analysis and evaluate among different deduplication approaches and applies on the mobile client.

Abstract

This project mainly focuses on the deduplication and create cloud storage client application. This project does not analyze new algorithm for deduplication chunking or focusing on deduplication performance, it just a comprehensive deduplication application with Nginx, HTTPS, API server and fully functional deduplication works for general user. The application works as a proxy between cloud storage client and cloud storage. All file upload is being deduplicated to binary data and uploads to the cloud storage. User benefits from reduces the storage usage on cloud storage. The client would work as same as a normal cloud storage client. We are only focusing on Dropbox cloud storage in this project.

Web-based Elderly Monitoring System and Mobile Application with Smart Wear

Subject Areas

Mobile Application; Mobile Application Development


Objectives

To develop a mobile app connecting elderly and their family; To develop a smart wear app for elderly

Abstract

Nowadays, people are busy with work and have no time to take care of their parents. This app can keep track with the elderly's location and also the heart rate and sleep quality. Many of the elderly do not know how to use smart phone and may not contact people to help when they are in danger. This smart wear app allows them to have emergence call and ally the worries of their family members.

Mathematics Learning Mobile Application (iOS, game-based)

Subject Areas

Game Programming; Mobile Computing; Mobile Learning; Mobile Multimedia


Objectives

To develop a mobile application on iOS platform related to Mathematics learning for primary school students; To digitalize various traditional math puzzles such that students can solve math problem with the aid of electronic devices.

Abstract

Mobile learning has become more popular nowadays and thus an effective way to facilitate studies. This project intends to design an educational application regarding mathematic concepts on iOS platform. It is targeted for primary school students for self-learning purpose. Unlike other typical mathematic mobile game, this application focus on being educational. Not only detail concept explanation will be included in the application, but also digitalized mathematical or logical puzzles related to those mathematic concepts.

Integrated Pipeline for Phylogenetic Analysis of Vertebrate Gene Families

Subject Areas

Bioinformatics


Objectives

Processed evolutionary process pipeline; Evolution process data visualization; Automatically generated rough report descripting the evolution process of specific gene family

Abstract

Inspired from the example below, we found that the investigation of evolution process on the similar kind of lysozyme gene family can be generalized and composed into a module. By processizing this pipeline, bioinformatics researchers can easily finish their study on other similar mammalian gene families in extreme shorter time compared to their work to be done currently. The output will be visualized by evolutionary tree, plus detailed description and analysis about the corresponding process. This project involves data processing, coding in C++ and python, data visualization. accomplished module will be uploaded to an integrated bioinformatics platform in the future. The details of the example can be found as follows; BACKGROUND: Lysozyme c (chicken-type lysozyme) has an important role in host defense, and has been extensively studied as a model in molecular biology, enzymology, protein chemistry, and crystallography. Traditionally, lysozyme c has been considered to be part of a small family that includes genes for two other proteins, lactalbumin, which is found only in mammals, and calcium-binding lysozyme, which is found in only a few species of birds and mammals. More recently, additional testes-expressed members of this family have been identified in human and mouse, suggesting that the mammalian lysozyme gene family is larger than previously known. RESULTS: Here we characterize the extent and diversity of the lysozyme gene family in the genomes of phylogenetically diverse mammals, and show that this family contains at least eight different genes that likely duplicated prior to the diversification of extant mammals. These duplicated genes have largely been maintained, both in intron-exon structure and in genomic context, throughout mammalian evolution. CONCLUSIONS: The mammalian lysozyme gene family is much larger than previously appreciated and consists of at least eight distinct genes scattered around the genome. Since the lysozyme c and lactalbumin proteins have acquired very different functions during evolution, it is likely that many of the other members of the lysozyme-like family will also have diverse and unexpected biological properties.

Interactive Circos, genomic data visualization

Subject Areas

Data Visualization


Objectives

To migrate code the Interactive Circos to platform BTDraw; To refactor Interactive Circos to make it compatible with BTDraw; To provide easy-to-use, concise sidebar without loss of highly-customized characteristic; To transform from graph-oriented to mutation-oriented; To provide novel visualization functions

Abstract

In the cancer research field, the raw genomic data is in a large magnitude while lacking at readability. Visualization is a significant method in genomic data analysis. The most frequently-used circular layout diagram visualization tool is Circos. Circos is popular for its ability of exploring relationship between objects and flexibility of accommodating multiple layers. However, its drawbacks like Perl-based, complicated configuration files brings difficulties to researchers. Moreover, its output images exists distortion problem at a large scalability. To solve this problem, this project aims to develop a mutation-oriented interactive Circos diagram visualization tool (Hereafter this project will be referred as Interactive Circos). Interactive Circos focus on providing high-quality WYSIWYG (What-You-See-Is-What-You-Get) visualization services with sufficient support for common mutation types in cancer research field. This project is web-based, which guarantee its high accessibility. Besides, this project keeps a balance of both easy-to-use and highly-customized characteristics. It allows users to generate esthetic interactive circular layout diagram efficiently. Once uploading a mutation file, and then Interactive Circos will process it and render pretty diagram quickly. And also it gives users the freedom to adjust as many settings of the diagram as they can imagine. The diagram can be exported in academic standard.

Intelligent Tutoring System for Sudoku with learning analytic capability

Subject Areas

Data Analysis; Mobile Application; intelligent tutoring system


Objectives

To define the problem of existing Sudoku ITS; To review existing Sudoku ITS; To define a conceptual model; To design proposed ITS; To implement the design and functions; To evaluate performance using data collected via Sudoku ITS

Abstract

Sudoku is a logic-based combinatorial number-placement puzzle, it lives throughout generations without being substituted or forgotten. Despite the popularity of Sudoku, there exists only little amount of researches or application that explains the use of it and hence some people start giving up playing when they think they could not solve it. People neglect the good effects on how Sudoku could help develop one's logic thinking. Moreover, even though there are existing ITS for Sudoku, their rules are not completed and their effectiveness of the hints could not be measured, so there is a need to address this issue. Nowadays, there are increase popularity in Intelligent Tutoring System (ITS), I believe it could apply it on Sudoku to make a more favourable environment for everyone regardless their age to learn and train Sudoku, especially on android app since the rapid growth of usage of mobile phones. In this project, I will review methodologies related to logic thinking, ITS and data mining for Sudoku on android system. The primary feature of the proposed system is to provide adaptive instructions during a Sudoku game with bilingual interfaces according to player's decision and performance and learning preference. It collects data on every move in the game and will modify the instructions by measuring the effectiveness of hints given.

Artificial Intelligence in Music Composing

Subject Areas

Artificial Intelligence; Machine Learning; Multimedia Information Retrieval; Neural Networks


Objectives

To identify different deep learning models for music generation, and rebuild the models to re-examine the limitation of current researches by using the uniform training sets for comparison. With the objective to find out the best deep learning architecture for various music styles.

Abstract

Previous work in music generation has mainly been focused on creating neural network for a single music style. More recent work has reported some remarkable success with different neural network architectures. My goal is to examine the outcome of multiples neural network architectures to various music styles. In this project, I introduce some basic probabilistic models based on estimated distribution of musical notes. Use complete piano roll representation to avoid agnostic learning and identify the feasibility of performing melody-wised and harmony-wised iteration through feedforward neural network and recurrent neural network.

Behavioral User Authentication using 3D Gestures

Subject Areas

Authentication; Information Security


Objectives

To find correlation between uniqueness of ones motion; To create authentication method using bodily motion; To apply ML algorithms to identify the unique motion of each individual

Abstract

Authentication methods are getting increasingly complex and several robust alternatives are being developed to replace alpha numeric passwords. Since requirements for modern day passwords are getting increasingly demanding, there has been a shift towards development of authentication schemes which incorporate behavioral characteristics such as gait or a 3D gesture. This project aims to create an authentication scheme based on 3D gestures created using a smartphone. Current approaches to 3D gesture authentication involve using spline interpolation functions to compare similarities of a gesture. These approaches ignore important gesture characteristics such as acceleration and gesture speed. In order to address this, my project implements Dynamic Time Warping to compare 3D gestures and also intends to run a speed similarity check. Dynamic Time Warping allows us measure similarity two gesture irrespective of their speed. The addition of a speed similarity check incorporates necessary behavioral characteristics into the authentication scheme thereby decreasing likelihood of the gesture being easily replicated by an attacker. Each user would have a unique method of replicating a given gesture and therefore would add to the complexity of a given 3D gesture. This report explains the working, drawbacks and possible applications of the developed system.

Machine Learning Application: Classification and Summarization of Legal Documents

Subject Areas

Artificial Intelligence; Data Analysis; Data Science; Machine Learning; Web Application


Objectives

To apply the concept of text mining into the industry of law; Creating a model that allows quick and accurate classification and summarization of legal case documents

Abstract

This research project aims to utilize text mining and machine learning technology to address the mentioned above concern. Legal documents are often complicated and difficult to be understood by commoners (Howe and Wogalter, 1994). The project is meant to create a machine learning system which produces a categorized and summarized information derived from the original legal documents. The simplified document produced by the model is designed to ease the understanding of the legal documents. The research aims to build a predictive machine learning model by utilizing a series of algorithm to produce a comprehensive automatic summarization machine. Blei, Ng and Jordan's (2003) Latent Dirichlet Allocation algorithm is implemented for identifying the major topics of the legal documents. Word2vec technique (Mikolov et al, 2013) is applied afterwards to convert sentences into vector matices, generating a feature space for LexRank algorithm (Erkan and Radev, 2004) to compute connectivity matrix of intra sentences based on IDF-modified-cosine formula to summarize the corpus. The extracted information is consolidated into a single coherent document at the final stage.

Improving the accuracy of low-quality eye tracker

Subject Areas

Computer Vision; Machine Learning


Objectives

To improve the accuracy of low-quality eye tribe tracker

Abstract

Eye tracking, refers to the process of measuring the eye gaze. An eye tracker is a device for measuring the position of eye gaze. The increased accuracy and accessibility of eye-tracking technologies in recent years have made it popular in many applications such as web usability, automotive driving and advertising. Recently, there are also new eye tracking applications appearing in HCI area. For example, eye tracking can be used to help the disabled to use computer efficiently, as they can jump between different applications by moving their eye fixations. However, most traditional hardware eye trackers are inconvenient to deploy in daily life. In recent years, with the rapid advancement in deep learning, some researchers have turned to Convolutional Neural Network (CNN) to do eye tracking, in which the inputs are the images of user's face or eyes and the output will be the predicted eye gaze coordinate. In this project, I aim to improve a existing eye tracking model iTracker from CSAIL for predicting the eye gaze. Since this model is trained for mobile phone, and I'd like to provide a solution for eye tracking in desktop, so I manage to port this model for computer. Afterwards, I use Kalman Smoother and CNN to process the output of iTracker to improve the accuracy of this model so that it will be capable of handling eye tracking tasks in daily scenarios.