Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Taming Big Data with Apache Spark and Python - Hands On!
Getting Started with Spark
Introduction
How to Use This Course
[Activity] Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
[Activity] Installing the MovieLens Movie Rating Dataset
[Activity] Run your first Spark program! Ratings histogram example.
Spark Basics and Simple Examples
Introduction to Spark
The Resilient Distributed Dataset (RDD)
Ratings Histogram Walkthrough
Key/Value RDD's, and the Average Friends by Age Example
[Activity] Running the Average Friends by Age Example
Filtering RDD's, and the Minimum Temperature by Location Example
[Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
[Activity] Running the Maximum Temperature by Location Example
[Activity] Counting Word Occurrences using flatmap()
[Activity] Improving the Word Count Script with Regular Expressions
[Activity] Sorting the Word Count Results
[Exercise] Find the Total Amount Spent by Customer
[Excercise] Check your Results, and Now Sort them by Total Amount Spent.
Check Your Sorted Implementation and Results Against Mine.
Advanced Examples of Spark Programs
[Activity] Find the Most Popular Movie
[Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
Find the Most Popular Superhero in a Social Graph
[Activity] Run the Script - Discover Who the Most Popular Superhero is!
Superhero Degrees of Separation: Introducing Breadth-First Search
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
[Activity] Superhero Degrees of Separation: Review the Code and Run it
Item-Based Collaborative Filtering in Spark, cache(), and persist()
[Activity] Running the Similar Movies Script using Spark's Cluster Manager
[Exercise] Improve the Quality of Similar Movies
Running Spark on a Cluster
Introducing Elastic MapReduce
[Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
Partitioning
Create Similar Movies from One Million Ratings - Part 1
[Activity] Create Similar Movies from One Million Ratings - Part 2
Create Similar Movies from One Million Ratings - Part 3
Troubleshooting Spark on a Cluster
More Troubleshooting, and Managing Dependencies
SparkSQL, DataFrames, and DataSets
Introducing SparkSQL
Executing SQL commands and SQL-style functions on a DataFrame
Using DataFrames instead of RDD's
Other Spark Technologies and Libraries
Introducing MLLib
[Activity] Using MLLib to Produce Movie Recommendations
Analyzing the ALS Recommendations Results
Using DataFrames with MLLib
Spark Streaming
[Activity] Structured Streaming in Python
GraphX
You Made It! Where to Go from Here.
Learning More about Spark and Data Science
More Troubleshooting, and Managing Dependencies
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock