Data Mining and Machine Learning I - Supervised and Unsupervised Learning

Crowd crossing road

11 weeks from 28 Apr to 11 Jul 2025
1 week break starting 2 June
Closing date: 31 Mar 2025 at 12pm

This course introduces students to machine learning methods and modern data mining techniques, with an emphasis on practical issues and applications.

By the end of this course learners will be able to:

  • apply and interpret methods of dimension reduction such as principal component analysis and the biplot
  • apply and interpret classical methods for cluster analysis
  • apply and interpret a wide range of methods for classification
  • explain and interpret ROC curves and performance measures such as AUC
  • fit support vector machines to data
  • assess predictive ability objectively.

Testimonial:

The content is very interesting. The different ways of examination provided an excellent challenge.

Pre-requisite knowledge

Learners should have prior experience of linear modelling and basic experience with the R programming language (e.g., data management and plotting).

This course is typically taken in year 1 of the MSc in Data Analytics/Data Analytics for Government programme.

This course assumes that you have comparative knowledge and skills covered in the following courses, alternatively, you may wish to consider taking some of the courses listed before attempting this course.

Syllabus

Week 1 (sample material)

  • Dimension reduction in data
  • Principal Component Analysis (PCA)
  • Performing PCA in R and interpreting its output

Week 2

  • Interpreting bivisualisation plot
  • Principal Component regression

Week 3

  • Classification
  • Overfitting
  • K-nearest neighbours

Week 4

  • Tree based modelling, bagging and random forests
  • Applying tree based modelling, bagging and random forests in R

Week 5

  • Support vector machines (SVMs)
  • Implementing linear SVMs in R
  • Kernelised SVMs

Mid-term week break

Week 6

  • Peer assessment

Week 7

  • Introduction to Model-Based Classification
  • Linear Discriminant Analysis and Fisher's Discriminant Analysis

Week 8

  • Quadratic and Mixture Model Discriminant Analysis
  • Generative vs. Discriminative Classification Models

Week 9

  • Cluster analysis
  • Reading dendograms
  • Choosing the number of clusters

Week 10

  • Partitioning cluster analysis
  • K-means clustering
  • Performing k-means clustering in R and interpreting its output

Software

To take our courses please use an up-to-date version of a standard browser (such as Google Chrome, Firefox, Safari, Internet Explorer or Microsoft Edge) and a PDF reader (such as Acrobat Reader).

Learning material will be distributed through Moodle. We encourage all learners to install R and RStudio and we provide detailed installation instructions, but learners can also use free cloud-based services (RStudio Cloud).

Learners need to install Zoom for participating in video conferencing sessions. We recommend the use of a head set for video conferencing sessions.