Since the very beginning of computers, processors were build with ever-increasing clock frequencies and instruction-level optimizations for faster serial code execution, such as ILP, caches, or speculative engines. Software developers and industry got used to the fact that applications get faster by just exchanging the underlying hardware. For several years now, these rules are proven to be no longer valid. Moore's law about the ever-increasing number of transistors per die is still valid, but decreased structural sizes and increased power consumption demand stalling, or even reduced, clock frequencies. Due to this development, serial execution performance no longer improves automatically with the next processor generation.
In the 'many-core era' that happens now, additional transistors are used not to speed up serial code paths, but to offer multiple execution engines ('cores') per processor. This changes every desktop-, server-, or even mobile system into a parallel computer. The exploitation of additional transistors is therefore now the responsibility of software, which makes parallel programming a mandatory approach for all software with scalability demands.
Introduction | 01:21:16 | |
---|---|---|
Course Design | 00:09:20 | |
Course Topics | 00:08:34 | |
Computer Markets | 00:10:51 | |
Laws of this Universe: Pi | 00:18:15 | |
Three Ways of Doing Anything Faster | 00:09:09 | |
Moore's Law | 00:13:13 | |
Instruction Level Parallelism | 00:11:54 |
Terminology & Fundamental Concepts | 01:24:14 | |
---|---|---|
Recap | 00:10:54 | |
Conventional Wisdoms Replaced | 00:13:54 | |
Terminology | 00:17:57 | |
Deadlock, Livelock, Race Condition | 00:15:27 | |
Multiprocessor | 00:12:36 | |
Shared Memory | 00:13:26 |
Terminology & Metrics | 01:24:56 | |
---|---|---|
Recap | 00:23:14 | |
Shared Memory | 00:10:29 | |
Shared-Nothing Workload | 00:09:21 | |
Programming Models | 00:05:45 | |
Metrics | 00:15:32 | |
Speedup | 00:09:14 | |
Amdahl's Law | 00:11:21 |
Metrics | 00:40:28 | |
---|---|---|
About the Assignment | 00:08:43 | |
Recap | 00:14:20 | |
Karp-Flatt-Metric | 00:17:25 |
Workloads | 00:33:46 | |
---|---|---|
Workloads | 00:11:18 | |
Execution Environment Mapping | 00:14:00 | |
Partitioning | 00:08:28 |
Foster Methodology | 00:37:42 | |
---|---|---|
Recap pt. 1 | 00:08:53 | |
Recap pt. 2 | 00:17:47 | |
Surface-To-Volume Effect | 00:11:02 |
Shared-Memory Concurrency | 00:42:15 | |
---|---|---|
Concurrency in History | 00:05:57 | |
Cooperating Sequential Processes pt.1 | 00:11:12 | |
Cooperating Sequential Processes pt.2 | 00:13:39 | |
Test-and-Set | 00:11:27 |
Shared Memory Concurrency (2) | 01:19:08 | |
---|---|---|
Recap | 00:13:59 | |
Test-and-Set | 00:10:02 | |
Coroutines | 00:21:50 | |
Monitors | 00:13:34 | |
Java Example | 00:11:29 | |
High-Level Primitives | 00:08:14 |
Shared-Memory Hardware | 01:19:24 | |
---|---|---|
About the Assignment | 00:14:27 | |
Recap | 00:17:31 | |
Shared-Memory Hardware | 00:17:37 | |
Parallel Processing | 00:13:09 | |
Pipelining Conflicts | 00:16:40 |
Parallel Processing | 01:12:54 | |
---|---|---|
Recap | 00:15:00 | |
Simultaneous Multi-Threading | 00:15:00 | |
Chip Multi-Processing | 00:07:23 | |
Multiprocessing | 00:18:43 | |
Hypertransport | 00:16:48 |
Shared-Memory Hardware & Shared-Memory Programming Models | 01:23:56 | |
---|---|---|
Recap | 00:22:00 | |
PRAM Extensions | 00:10:52 | |
LogP | 00:04:45 | |
Shared-Memory Programming Models | 00:08:34 | |
POSIX Threads | 00:18:45 | |
Java | 00:19:00 |
Assignment Feedback & C++ | 01:25:55 | |
---|---|---|
Assignments | 00:04:15 | |
Assignment 1: Problems & Solutions... | 00:17:23 | |
64-bit Overflow | 00:18:02 | |
Assignments to Come... | 00:12:02 | |
C++11 | 00:17:20 | |
C++11 Memory Model | 00:16:53 |
OpenMP | 01:27:34 | |
---|---|---|
Assignment 3 | 00:06:45 | |
Recap | 00:20:38 | |
Threads vs. Tasks | 00:05:46 | |
OpenMP | 00:09:55 | |
OpenMP Language Extensions | 00:18:47 | |
OpenMP Sections | 00:10:57 | |
OpenMP Loop Parallelization | 00:14:46 |
Cilk | 01:19:42 | |
---|---|---|
Recap C++ | 00:20:53 | |
Recap OpenMP | 00:15:50 | |
Cilk | 00:06:14 | |
Intel Cilk Plus | 00:22:35 | |
Intel Threading Building Blocks | 00:14:10 |
Advanced Shared-Memory Programming | 01:29:00 | |
---|---|---|
General Remarks | 00:07:50 | |
Advanced Shared-Memory Programming | 00:15:40 | |
Unified Parallel C | 00:08:44 | |
X10 | 00:15:52 | |
Fortran | 00:16:06 | |
Functional Programming | 00:07:07 | |
Clojure | 00:17:41 |
GPU Computing with OpenCL | 01:19:12 | |
---|---|---|
Data Parallelism and Task Parallelism | 00:13:41 | |
History of GPU Computing | 00:13:59 | |
OpenCL Platform Model | 00:16:23 | |
OpenCL Memory Architecture | 00:08:20 | |
OpenCL Work Item Code | 00:11:12 | |
OpenCL Hello World Example | 00:15:37 |
Hardware Characteristics & Performance Tuning | 01:24:02 | |
---|---|---|
Assignment Feedback 2 | 00:17:52 | |
Recap | 00:11:05 | |
Dynamic Parallelism: The Vision | 00:14:04 | |
GPU Hardware in Detail | 00:17:35 | |
The Power of GPU Computing | 00:13:43 | |
Use Caching: Local, Texture, Constant | 00:09:43 |
Shared Nothing Parallelism | 01:24:15 | |
---|---|---|
Introduction | 00:07:53 | |
Cluster | 00:17:47 | |
Cluster System Classes | 00:16:42 | |
Interconnection Networks | 00:15:36 | |
Bus | 00:15:03 | |
Completely Connected & Star Connected Networks | 00:11:14 |
Shared Nothing Parallelism - Theory | 01:21:16 | |
---|---|---|
Recap | 00:15:12 | |
Shared Nothing Parallelism - Theory | 00:07:57 | |
Communicating Sequential Processes | 00:09:12 | |
CSP: Processes | 00:17:29 | |
CSP Process Description - Choice | 00:10:06 | |
Communication in CSP | 00:12:13 | |
What's the Deal? | 00:09:07 |
Shared Nothing Parallelism - MPI | 01:26:52 | |
---|---|---|
Recap | 00:16:57 | |
Linda Model | 00:13:07 | |
Shared Nothing Parallelism - MPI | 00:14:40 | |
PVM Example | 00:10:22 | |
MPI Communicators | 00:18:53 | |
Deadlocks | 00:12:53 |
Assignment Feedback 3 | 00:37:30 | |
---|---|---|
Heat Map with OpenMP | 00:08:30 | |
Parallel Decrypt | 00:03:45 | |
Worley Noise | 00:13:19 | |
Assignments to Come... | 00:11:56 |
Non-Blocking- & Collective Communication | 00:42:49 | |
---|---|---|
Recap | 00:08:37 | |
Circular Left Shift Example | 00:09:44 | |
Non-Blocking Communication | 00:07:17 | |
Collective Communication | 00:17:11 |
MPI | 00:33:59 | |
---|---|---|
Recap | 00:12:55 | |
MPI Prefix Scan | 00:16:28 | |
What Else | 00:04:36 |
Actors and Channels | 00:37:48 | |
---|---|---|
Actor Model | 00:11:04 | |
Erlang - Ericsson Language | 00:10:59 | |
Concurrent Programming in Erlang | 00:15:45 |
Erlang, Scala & Go | 01:24:55 | |
---|---|---|
Assignment | 00:12:23 | |
Recap | 00:16:24 | |
Concurrent Programming in Erlang | 00:07:48 | |
Scala | 00:14:12 | |
Object-Oriented Programming in Scala | 00:15:16 | |
Scala Actors and Case Classes | 00:08:08 | |
Go | 00:10:44 |
Assignment Feedback 4 | 01:23:31 | |
---|---|---|
Assignment 4: Problems & Solutions | 00:06:56 | |
OpenCL - Simple Reductions | 00:16:09 | |
Heat Map with CUDA / OpenCL | 00:25:12 | |
Convolution | 00:08:37 | |
Instructions and Precision | 00:11:21 | |
Parallel Decrypt | 00:15:16 |
What Will Future Hardware Look Like? | 01:22:33 | |
---|---|---|
Demanding Applications and Problems of the Future | 00:16:12 | |
Future Hardware | 00:15:39 | |
How Would You Program? | 00:13:44 | |
What Will Future Hardware Look Like? | 00:13:42 | |
(R)Evolution | 00:14:47 | |
Accelerators | 00:08:29 |
Systems | 01:19:16 | |
---|---|---|
Xeon Phi Hardware | 00:17:37 | |
Cilk vs. TBB vs. OpenMP | 00:18:17 | |
Systems | 00:12:35 | |
Super Computer Performance Development | 00:12:27 | |
OpenMC | 00:09:31 | |
Conclusion | 00:08:49 |
What are the (Computationally) Demanding Problems/Applications of the Future? | 01:24:30 | |
---|---|---|
About the Assignment | 00:03:35 | |
Introduction | 00:17:31 | |
Applications | 00:12:25 | |
Sparse Linear Algebra | 00:13:13 | |
Structured Grid | 00:14:43 | |
Map Reduce | 00:14:12 | |
Graphical Models | 00:08:51 |
What Kind of Programming Model Can Bridge the Gap? | 01:19:27 | |
---|---|---|
Dwarf Popularity | 00:14:40 | |
Hybrid System | 00:15:16 | |
Patterns for Parallel Programming Content | 00:16:52 | |
OpenACC | 00:17:52 | |
The GPU Paging Cache | 00:14:47 |
Assignment Feedback 5 | 00:23:34 | |
---|---|---|
Assignment 4: Problems & Solutions | 00:10:34 | |
MPI Collective | 00:09:37 | |
Nqueens with Scala | 00:03:23 |
Summary | 00:56:32 | |
---|---|---|
Parallel Programming in Real Life | 00:02:44 | |
Summary | 00:12:51 | |
Parallelism on Different Levels | 00:11:13 | |
Software View: Concurrency vs. Parallelism | 00:14:36 | |
Concurrent Execution | 00:15:08 |