Autoplay
Autocomplete
Previous Lesson
Complete and Continue
The Ultimate Hands-On Hadoop - Tame your Big Data!
Learn all the buzzwords! And install Hadoop.
Tips for Using This Course
If you have trouble downloading Hortonworks...
Warning for Apple M1 Users
Introduction, and install Hadoop on your desktop
The Hortonworks and Cloudera Merger, and how it affects this course.
Hadoop Overview and History
Overview of the Hadoop Ecosystem
Using Hadoop's Core: HDFS and MapReduce
HDFS: What it is, and how it works
Alternate MovieLens download location
Install the MovieLens dataset into HDFS using the Ambari UI
Install the MovieLens dataset into HDFS using the command line
MapReduce: What it is, and how it works
How MapReduce distributes processing
MapReduce example: Break down movie ratings by rating score
Notes on MRJob installation
Installing Python, MRJob, and nano
Code up the ratings histogram MapReduce job and run it
Rank movies by their popularity
Check your results against mine!
Programming Hadoop with Pig
Introducing Ambari
Introducing Pig
Example: Find the oldest movie with a 5-star rating using Pig
Find old 5-star movies with Pig
More Pig Latin
Find the most-rated one-star movie
Pig Challenge: Compare Your Results to Mine!
Programming Hadoop with Spark
Why Spark?
The Resilient Distributed Dataset (RDD)
Find the movie with the lowest average rating - with RDD's
Datasets and Spark 2.0
Find the movie with the lowest average rating - with DataFrames
Movie recommendations with MLLib
Filter the lowest-rated movies by number of ratings
Check your results against mine!
Using Relational Data Stores with Hadoop
What is Hive?
Use Hive to find the most popular movie
How Hive works
Use Hive to find the movie with the highest average rating
Compare your solution to mine.
Integrating MySQL with Hadoop
Cheat Sheet for the following lecture
Install MySQL and import our movie data
Use Sqoop to import data from MySQL to HFDS/Hive
Use Sqoop to export data from Hadoop to MySQL
Using non-relational data stores with Hadoop
Why NoSQL?
What is HBase
Import movie ratings into HBase
Use HBase with Pig to import data at scale.
Cassandra overview
If you have trouble installing Cassandra...
Installing Cassandra
Write Spark output into Cassandra
MongoDB overview
Install MongoDB, and integrate Spark with MongoDB
Using the MongoDB shell
Choosing a database technology
Choose a database for a given problem
Querying your Data Interactively
Overview of Drill
Setting up Drill
Querying across multiple databases with Drill
Overview of Phoenix
Install Phoenix and query HBase with it
Integrate Phoenix with Pig
Overview of Presto
Install Presto, and query Hive with it.
Query both Cassandra and Hive using Presto.
Managing your Cluster
YARN explained
Tez explained
Use Hive on Tez and measure the performance benefit
Mesos explained
ZooKeeper explained
Simulating a failing master with ZooKeeper
Oozie explained
Set up a simple Oozie workflow
Zeppelin overview
Use Zeppelin to analyze movie ratings, part 1
Use Zeppelin to analyze movie ratings, part 2
Hue overview
Other technologies worth mentioning
Feeding Data to your Cluster
Kafka explained
Setting up Kafka, and publishing some data.
Publishing web logs with Kafka
Flume explained
Set up Flume and publish logs with it.
Set up Flume to monitor a directory and store its data in HDFS
Analyzing Streams of Data
Spark Streaming: Introduction
Analyze web logs published with Flume using Spark Streaming
Monitor Flume-published logs for errors in real time
Exercise solution: Aggregating HTTP access codes with Spark Streaming
Apache Storm: Introduction
Count words with Storm
Flink: An Overview
Counting words with Flink
Designing Real-World Systems
Best Of The Rest
Review: How the pieces fit together
Understanding your requirements
Sample application: consume webserver logs and keep track of top-sellers
Sample application: serving movie recommendations to a website
Design a system to report web sessions per day
Exercise solution: Design a system to count daily sessions
Learning More
Books and online resources
Set up Flume to monitor a directory and store its data in HDFS
Complete and Continue