Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Taming Big Data with Apache Spark and Python - Hands On!
Getting Started with Spark
Introduction
How to Use This Course
[Activity] Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
[Activity] Installing the MovieLens Movie Rating Dataset
[Activity] Run your first Spark program! Ratings histogram example.
Spark Basics and Simple Examples
Introduction to Spark
The Resilient Distributed Dataset (RDD)
Ratings Histogram Walkthrough
Key/Value RDD's, and the Average Friends by Age Example
[Activity] Running the Average Friends by Age Example
Filtering RDD's, and the Minimum Temperature by Location Example
[Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
[Activity] Running the Maximum Temperature by Location Example
[Activity] Counting Word Occurrences using flatmap()
[Activity] Improving the Word Count Script with Regular Expressions
[Activity] Sorting the Word Count Results
[Exercise] Find the Total Amount Spent by Customer
[Excercise] Check your Results, and Now Sort them by Total Amount Spent.
Check Your Sorted Implementation and Results Against Mine.
Advanced Examples of Spark Programs
[Activity] Find the Most Popular Movie
[Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
Find the Most Popular Superhero in a Social Graph
[Activity] Run the Script - Discover Who the Most Popular Superhero is!
Superhero Degrees of Separation: Introducing Breadth-First Search
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
[Activity] Superhero Degrees of Separation: Review the Code and Run it
Item-Based Collaborative Filtering in Spark, cache(), and persist()
[Activity] Running the Similar Movies Script using Spark's Cluster Manager
[Exercise] Improve the Quality of Similar Movies
Running Spark on a Cluster
Introducing Elastic MapReduce
[Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
Partitioning
Create Similar Movies from One Million Ratings - Part 1
[Activity] Create Similar Movies from One Million Ratings - Part 2
Create Similar Movies from One Million Ratings - Part 3
Troubleshooting Spark on a Cluster
More Troubleshooting, and Managing Dependencies
SparkSQL, DataFrames, and DataSets
Introducing SparkSQL
Executing SQL commands and SQL-style functions on a DataFrame
Using DataFrames instead of RDD's
Other Spark Technologies and Libraries
Introducing MLLib
[Activity] Using MLLib to Produce Movie Recommendations
Analyzing the ALS Recommendations Results
Using DataFrames with MLLib
Spark Streaming
[Activity] Structured Streaming in Python
GraphX
You Made It! Where to Go from Here.
Learning More about Spark and Data Science
Introduction
Complete and Continue