About This Course

What should you do when you are facing a huge amount of complicated data from real life applications? This course introduces the core techniques in big data analytics, namely knowledge discovery in databases (KDD), also known as data mining (DM). It focuses on the principles, fundamental algorithms, implementations, and applications.

Prerequisites
  1. Comprehensive understanding and skills in data structures, such as linked data structures, B-trees, and hash functions.
  2. Analysis of algorithms and time complexity.
  3. Operating systems, main memory and disk management, file systems.
  4. Elementary probability theory and statistics, such as random variables, distributions, probability mass functions, sampling, and statistical tests.

Textbook and references
  1. (Official textbook) Data Mining: Concepts and Techniques (3rd ed.), Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann, 2011. [Hard Copy, PDF]
Teaching format
  1. The video lectures will be pre-recorded and posted at YouTube (in unlisted mode) with links provided on this webpage. You should view the video on or before the specified date.
  2. The class will meet online every Thursday afternoon 3:30-4:20 pm using Zoom. We will discuss assignments and projects, run quizzes, and do office hour at the time.
  3. We will use Piazza for Q&A.
Modified on September 8th, 2020.