The amount of data that can be generated and stored in academic and industrial projects and applications is increasing rapidly. Big data analytics technologies have established themselves as a solution for big data challenges to the scalability problems of traditional database systems. The vast amounts of new data that is collected, however, usually is not as easily analyzed as curated, structured data in a data warehouse is. Typically, these data are noisy, of varying format and velocity, and need to be analyzed with techniques from statistics and machine learning rather than pure SQL-like aggregations and drill-downs. Moreover, the results of the analyses frequently are models that are used for decision making and prediction. The complete process of big data analysis is described as a pipeline, which includes data recording, cleaning, integration, modeling, and interpretation.
In this lecture, we will discuss big data systems, i.e., infrastructures that are used to handle all steps in typical big data processing pipelines.
Introduction | 01:15:49 | |
---|---|---|
Big Data | 00:24:36 | |
Data Science | 00:20:58 | |
Course Logistics | 00:30:15 |
Database Systems Recab | 01:29:32 | |
---|---|---|
Announcements | 00:07:15 | |
Relational Databases | 00:21:30 | |
ER Model, Relational Schema and Instance | 00:09:41 | |
Normal Forms | 00:13:02 | |
Delete Anomaly | 00:14:43 | |
Structured Querry Language | 00:23:21 |
RDBMS Internals | 01:12:54 | |
---|---|---|
Memory Hierarchy | 00:13:08 | |
Bottom Up | 00:11:26 | |
Access Methods | 00:20:32 | |
Hashing | 00:17:47 | |
Query Processing | 00:10:01 |
Big Data Stack | 01:30:37 | |
---|---|---|
Quick Survey | 00:11:41 | |
Big Data Stack and its Evolution | 00:22:36 | |
Applications | 00:00:00 | |
Big Data Processing Evolution | 00:56:20 |
Benchmarking und Measurement | 01:27:04 | |
---|---|---|
Why Measure? | 00:26:20 | |
Basic Terminology | 00:28:01 | |
Sample vs Population | 00:12:58 | |
Paired Observation | 00:19:45 |
Benchmarks | 01:27:59 | |
---|---|---|
Announcements and Recap | 00:11:14 | |
Benchmarks | 00:16:45 | |
TPC to the rescue! | 00:14:38 | |
BigBench | 00:22:05 | |
Data Centers | 00:23:17 |
Cloud Computing | 01:28:53 | |
---|---|---|
Virtualization | 00:29:07 | |
Scheduling | 00:32:21 | |
Cloud Services | 00:27:25 |
Distributed File Systems | 01:29:13 | |
---|---|---|
File Systems | 00:30:33 | |
Network File System | 00:13:47 | |
Google File System | 00:22:21 | |
Hadoop Distributed File System | 00:22:32 |
Map Reduce | 01:21:21 | |
---|---|---|
MapReduce | 00:20:05 | |
Shuffling / Sorting Stage | 00:21:51 | |
Multi-Phase MMS | 00:21:39 | |
Fault Tolerance | 00:17:46 |
Map Reduce 2 | 01:28:01 | |
---|---|---|
Map Reduce Stack | 00:22:57 | |
SQL on MR | 00:26:00 | |
Apache Spark | 00:39:04 |
Wide Column Stores | 01:26:35 | |
---|---|---|
Key-Value Stores | 00:18:37 | |
Data Model Design Principles | 00:19:03 | |
Common Properties of kV-Stores | 00:21:25 | |
Three-Ühase Commit | 00:10:53 | |
CAP Theorem - Overview | 00:16:37 |
Key Value Stores | 01:30:15 | |
---|---|---|
Data Storage | 00:31:13 | |
Distributed Architecture | 00:39:24 | |
BigTable / HBase | 00:19:38 |
Key Value Stores & Stream Processing Systems I | 01:25:38 | |
---|---|---|
BigTable / HBase | 00:35:32 | |
Cassandra | 00:55:01 | |
Stream Processing | 00:30:37 |
Databases On Modern Hardware | 01:25:10 | |
---|---|---|
Traditional Database Systems | 00:10:12 | |
In-Memory Databases On Modern Hardware | 00:18:54 | |
Performance Limitation of Modern Processors | 00:17:09 | |
Hazards | 00:00:00 | |
Prediction | 00:00:00 |
Stream Processing Systems I - Part 2 | 01:30:26 | |
---|---|---|
Processing Windows | 00:12:31 | |
Windowed Join | 00:28:34 | |
Efficient Window Aggregation | 00:16:22 | |
Wat makes a system a Stream processing system | 00:32:59 |
Ad-hoc Stream Querry Processing & Stream Processing Systems 1 | 01:28:26 | |
---|---|---|
Challenges | 00:25:04 | |
AJoin Architecture | 00:14:59 | |
Join Reordering | 00:20:13 | |
Apache Storm | 00:28:10 |
Machine Learning Systems - Introduction | 01:24:35 | |
---|---|---|
Motivation | 00:26:13 | |
ML Systems - Overview | 00:37:08 | |
Stack of ML Systems | 00:21:14 |
Machine Learning Systems - Introduction - Part 2 | 01:26:14 | |
---|---|---|
Announcements | 00:02:27 | |
Stack of ML Systems | 00:18:42 | |
Language Abstractions & System Architectures | 00:27:20 | |
Execution Strategies | 00:03:56 | |
Data Parallel Execution | 00:20:53 | |
Task Parallel Execution | 00:06:48 | |
Data-Parallel Parameter Server | 00:06:08 |
Graph Database Systems | 01:29:25 | |
---|---|---|
Introduction | 00:14:30 | |
Comparison to RDBMS | 00:30:40 | |
Representation of Graphs | 00:12:28 | |
Query Languages | 00:31:47 |
Graph Processing Systems | 01:18:55 | |
---|---|---|
Introduction | 00:16:58 | |
Applications | 00:04:55 | |
Neo4j | 00:07:47 | |
The Algorithmic Perspective | 00:46:15 | |
Summary | 00:03:00 |
Prüfungsvorbereitung | 00:25:47 | |
---|---|---|
Fragerunde | 00:25:47 |