Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Apache Spark 2 with Python - Big Data with PySpark and Spark
Section 1: Get Started with Apache Spark
Course Overview (4:09)
Introduction to Spark (2:28)
Install Java and Git (8:53)
Set up Spark (9:22)
Run our first Spark job (3:48)
Section 2: RDD
RDD Basics (2:50)
Create RDDs (2:32)
Map and Filter Transformation (9:29)
Solution to Airports by Latitude Problem (1:57)
FlatMap Transformation (3:46)
Set Operation (8:26)
Solution for the Same Hosts Problem (1:54)
Actions (9:03)
Solution to Sum of Numbers Problem (2:06)
Important Aspects about RDD (1:40)
Summary of RDD Operations (2:26)
Caching and Persistence (5:16)
Section 3: Spark Architecture and Components
Spark Architecture (3:01)
Spark Components (5:26)
Section 4: Pair RDD
Introduction to Pair RDD (1:38)
Create Pair RDDs (4:15)
Filter and MapValue Transformations on Pair RDD (5:16)
Reduce By Key Aggregation (5:38)
Sample solution for the Average House problem (3:24)
Group By Key Transformation (5:15)
Sort By Key Transformation: (2:51)
Sample Solution for the Sorted Word Count Problem (3:24)
Data Partitioning (4:18)
Join Operations (5:12)
Section 5: Advanced Spark Topic
Accumulators (3:44)
Solution to StackOverflow Survey Follow-up Problem (1:05)
Broadcast Variables (6:46)
Section 6: Spark SQL
Introduction to Spark SQL (3:56)
Spark SQL in Action (13:12)
Spark SQL practice: House Price Problem (1:54)
Spark SQL Joins (7:03)
Dataframe or RDD (2:57)
Dataframe and RDD Conversion (2:54)
Performance Tuning of Spark SQL (2:51)
Section 7: Running Spark in a Cluster
Introduction to Running Spark in a Cluster (4:05)
spark-submit (2:41)
Run Spark Application on Amazon EMR cluster (15:10)
Dataframe or RDD
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock