Algorithmic Foundations of Data Science
Lecture
In the age of "big data" and "advanced analytics", data processing faces new challenges. Queries become more complex and often involve data mining and machine learning tasks, and the scale of the datasets requires new algorithmic approaches.
This course will cover the "theoretical foundations" of modern data processing and analytics. This includes topics from database theory, such as data models, the analysis of query languages, and basic algorithmic and complexity theoretic questions related to query processing. It also includes topics from algorithmic learning theory, such as basic machine learning algorithms, support vector machines, the PAC model, and VC-Dimension. Furthermore, it includes new models of computation on massive datasets, such as the streaming model and the map-reduce paradigm, and algorithms for these models.
We will focus on "computational aspects" of the theory. Statistics, though undoubtedly one of the foundations of data science, will not play a central role in this course.
Homework and Exam
Exercises
There will be two kinds of exercises:
- Biweekly excercises to deepen your understanding about the material presented in the lecture. These exercises will be graded and a successful participation is required to be admitted to the exam.
- In addition, there are weekly e-tests that will be graded. Sufficient marks in these tests will be a prerequisite to participate in the exam.
Exam
There will be a written exam associated to the course. Sufficient marks in the homework assignments and in the e-tests are necessary to participate in the exam. More precise details about the exam will be announced via RWTHmoodle.
Prerequisites
The lecture is offered as a bachelor and master course. There are no formal prerequisites, apart from a certain scientific and mathematical maturity. Depending on your preparation, some topics will be more accessible than others.
For certain homework exercises, you will need some knowledge in programming. The programming language can be chosen freely.
Organisation
The lecture will be taught in a "flipped-classroom" concept, i.e., we will publish multiple learning videos each week which are all roughly 10–15 mins long. The content of these videos will be discussed in a weekly discussion session, which will be held in-person and streamed live via zoom.
The exercise class will be offered in an in-person format.
Some but not all meetings will be recorded for later availability. This largely depends on whether the respective lecture part is "student-focused" and "student-driven" (for example the q&a/discussion session, where we don’t want students to be reluctant with questions because of being recorded).