Autoplay
Autocomplete
Previous Lesson
Complete and Continue
From 0 to 1 : Spark for Data Science with Python
You, This Course and Us
You, This Course and Us (2:15)
Introduction to Spark
What does Donald Rumsfeld have to do with data analysis? (8:45)
Why is Spark so cool? (12:23)
An introduction to RDDs - Resilient Distributed Datasets (9:39)
Built-in libraries for Spark (15:37)
Installing Spark (6:42)
The PySpark Shell (4:51)
Transformations and Actions (13:33)
See it in Action : Munging Airlines Data with PySpark - I (10:13)
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables (8:25)
Resilient Distributed Datasets
RDD Characteristics: Partitions and Immutability (12:35)
RDD Characteristics: Lineage, RDDs know where they came from (6:06)
What can you do with RDDs? (11:09)
Create your first RDD from a file (16:11)
Average distance travelled by a flight using map() and reduce() operations (5:50)
Get delayed flights using filter(), cache data using persist() (5:23)
Average flight delay in one-step using aggregate() (15:10)
Frequency histogram of delays using countByValue() (3:26)
See it in Action : Analyzing Airlines Data with PySpark - II (6:25)
Advanced RDDs: Pair Resilient Distributed Datasets
Special Transformations and Actions (14:45)
Average delay per airport, use reduceByKey(), mapValues() and join() (18:11)
Average delay per airport in one step using combineByKey() (11:53)
Get the top airports by delay using sortBy() (4:34)
Lookup airport descriptions using lookup(), collectAsMap(), broadcast() (14:03)
See it in Action : Analyzing Airlines Data with PySpark - III (4:58)
Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
Get information from individual processing nodes using accumulators (13:35)
See it in Action : Using an Accumulator variable (2:41)
Long running programs using spark-submit (5:58)
See it in Action : Running a Python script with Spark-Submit (3:58)
Behind the scenes: What happens when a Spark script runs? (14:30)
Running MapReduce operations (13:44)
See it in Action : MapReduce with Spark (2:05)
Java and Spark
The Java API and Function objects (15:59)
Pair RDDs in Java (4:49)
Running Java code (3:49)
Installing Maven (2:20)
See it in Action : Running a Spark Job with Java (5:08)
PageRank: Ranking Search Results
What is PageRank? (16:44)
The PageRank algorithm (6:15)
Implement PageRank in Spark (12:01)
Join optimization in PageRank using Custom Partitioning (7:27)
See it Action : The PageRank algorithm using Spark (3:46)
Spark SQL
Dataframes: RDDs + Tables (16:05)
See it in Action : Dataframes and Spark SQL (4:50)
MLlib in Spark: Build a recommendations engine
Collaborative filtering algorithms (12:19)
Latent Factor Analysis with the Alternating Least Squares method (11:39)
Music recommendations using the Audioscrobbler dataset (7:51)
Implement code in Spark using MLlib (16:05)
Spark Streaming
Introduction to streaming (9:55)
Implement stream processing in Spark using Dstreams (10:54)
Stateful transformations using sliding windows (9:26)
See it in Action : Spark Streaming (4:17)
Graph Libraries
The Marvel social network using Graphs (18:01)
Frequency histogram of delays using countByValue()
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock