Autoplay
Autocomplete
Previous Lesson
Complete and Continue
The Ultimate Hands-On Hadoop - Tame your Big Data!
Learn all the buzzwords! And install Hadoop.
Tips for Using This Course (1:09)
If you have trouble downloading Hortonworks...
Warning for Apple M1 Users
Introduction, and install Hadoop on your desktop (19:01)
The Hortonworks and Cloudera Merger, and how it affects this course. (3:01)
Hadoop Overview and History (7:44)
Overview of the Hadoop Ecosystem (16:48)
Using Hadoop's Core: HDFS and MapReduce
HDFS: What it is, and how it works (13:56)
Alternate MovieLens download location
Install the MovieLens dataset into HDFS using the Ambari UI (6:22)
Install the MovieLens dataset into HDFS using the command line (7:52)
MapReduce: What it is, and how it works (10:42)
How MapReduce distributes processing (12:59)
MapReduce example: Break down movie ratings by rating score (11:37)
Notes on MRJob installation
Installing Python, MRJob, and nano (13:19)
Code up the ratings histogram MapReduce job and run it (7:36)
Rank movies by their popularity (7:06)
Check your results against mine! (8:25)
Programming Hadoop with Pig
Introducing Ambari (9:49)
Introducing Pig (6:27)
Example: Find the oldest movie with a 5-star rating using Pig (15:09)
Find old 5-star movies with Pig (9:42)
More Pig Latin (7:36)
Find the most-rated one-star movie (1:56)
Pig Challenge: Compare Your Results to Mine! (5:39)
Programming Hadoop with Spark
Why Spark? (10:08)
The Resilient Distributed Dataset (RDD) (10:14)
Find the movie with the lowest average rating - with RDD's (15:33)
Datasets and Spark 2.0 (6:30)
Find the movie with the lowest average rating - with DataFrames (10:02)
Movie recommendations with MLLib (12:43)
Filter the lowest-rated movies by number of ratings (2:52)
Check your results against mine! (6:42)
Using Relational Data Stores with Hadoop
What is Hive? (6:33)
Use Hive to find the most popular movie (10:45)
How Hive works (9:12)
Use Hive to find the movie with the highest average rating (1:56)
Compare your solution to mine. (4:12)
Integrating MySQL with Hadoop (8:02)
Cheat Sheet for the following lecture
Install MySQL and import our movie data (7:45)
Use Sqoop to import data from MySQL to HFDS/Hive (7:01)
Use Sqoop to export data from Hadoop to MySQL (7:16)
Using non-relational data stores with Hadoop
Why NoSQL? (13:57)
What is HBase (12:57)
Import movie ratings into HBase (13:30)
Use HBase with Pig to import data at scale. (11:21)
Cassandra overview (14:53)
If you have trouble installing Cassandra...
Installing Cassandra (10:53)
Write Spark output into Cassandra (11:00)
MongoDB overview (17:19)
Install MongoDB, and integrate Spark with MongoDB (12:44)
Using the MongoDB shell (7:48)
Choosing a database technology (16:01)
Choose a database for a given problem (5:02)
Querying your Data Interactively
Overview of Drill (7:57)
Setting up Drill (10:58)
Querying across multiple databases with Drill (7:09)
Overview of Phoenix (8:57)
Install Phoenix and query HBase with it (7:02)
Integrate Phoenix with Pig (11:47)
Overview of Presto (6:41)
Install Presto, and query Hive with it. (12:29)
Query both Cassandra and Hive using Presto. (9:03)
Managing your Cluster
YARN explained (10:03)
Tez explained (4:58)
Use Hive on Tez and measure the performance benefit (8:37)
Mesos explained (7:15)
ZooKeeper explained (13:12)
Simulating a failing master with ZooKeeper (6:49)
Oozie explained (11:58)
Set up a simple Oozie workflow (16:54)
Zeppelin overview (5:04)
Use Zeppelin to analyze movie ratings, part 1 (12:28)
Use Zeppelin to analyze movie ratings, part 2 (9:48)
Hue overview (8:08)
Other technologies worth mentioning (4:37)
Feeding Data to your Cluster
Kafka explained (9:50)
Setting up Kafka, and publishing some data. (7:24)
Publishing web logs with Kafka (10:21)
Flume explained (10:18)
Set up Flume and publish logs with it. (7:46)
Set up Flume to monitor a directory and store its data in HDFS (9:14)
Analyzing Streams of Data
Spark Streaming: Introduction (14:29)
Analyze web logs published with Flume using Spark Streaming (14:20)
Monitor Flume-published logs for errors in real time (2:02)
Exercise solution: Aggregating HTTP access codes with Spark Streaming (4:26)
Apache Storm: Introduction (9:29)
Count words with Storm (15:49)
Flink: An Overview (6:55)
Counting words with Flink (10:20)
Designing Real-World Systems
Best Of The Rest (9:26)
Review: How the pieces fit together (6:31)
Understanding your requirements (8:04)
Sample application: consume webserver logs and keep track of top-sellers (10:08)
Sample application: serving movie recommendations to a website (11:20)
Design a system to report web sessions per day (2:53)
Exercise solution: Design a system to count daily sessions (4:26)
Learning More
Books and online resources (5:32)
Review: How the pieces fit together
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock