Autoplay
Autocomplete
Previous Lesson
Complete and Continue
The Big Data Omnibus: Hadoop, Spark, Storm and QlikView
Introduction
A Brief Introduction to Hadoop (0:41)
Why is Big Data a Big Deal
The Big Data Paradigm (14:20)
Serial vs Distributed Computing (8:37)
What is Hadoop? (7:25)
HDFS or the Hadoop Distributed File System (11:01)
MapReduce Introduced (11:39)
YARN or Yet Another Resource Negotiator (4:00)
Installing Hadoop in a Local Environment
Hadoop Install Modes (8:32)
Hadoop Standalone mode Install (15:46)
Hadoop Pseudo-Distributed mode Install (11:44)
The MapReduce "Hello World"
The basic philosophy underlying MapReduce (8:49)
MapReduce - Visualized And Explained (9:03)
MapReduce - Digging a little deeper at every step (10:21)
"Hello World" in MapReduce (10:29)
The Mapper (9:48)
The Reducer (7:46)
The Job (12:28)
Run a MapReduce Job
Get comfortable with HDFS (10:59)
Run your first MapReduce Job (14:30)
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
Parallelize the reduce phase - use the Combiner (14:40)
Not all Reducers are Combiners (14:31)
How many mappers and reducers does your MapReduce have? (8:23)
Parallelizing reduce using Shuffle And Sort (14:55)
HDFS and Yarn
HDFS - Protecting against data loss using replication (15:38)
HDFS - Name nodes and why they're critical (6:54)
HDFS - Checkpointing to backup name node information (11:16)
Yarn - Basic components (8:39)
Yarn - Submitting a job to Yarn (13:16)
Yarn - Plug in scheduling policies (14:27)
Yarn - Configure the scheduler (12:32)
MapReduce Customizations For Finer Grained Control
Configuring properties of the Job object (13:47)
Setting up your MapReduce to accept command line arguments (12:36)
Customizing the Partitioner, Sort Comparator, and Group Comparator (10:41)
The Tool, ToolRunner and GenericOptionsParser (15:16)
Introduction
A Brief Introduction to Spark (0:47)
Introduction to Spark
What does Donald Rumsfeld have to do with data analysis? (8:45)
Why is Spark so cool? (12:23)
An introduction to RDDs - Resilient Distributed Datasets (9:39)
Built-in libraries for Spark (15:37)
Installing Spark (6:42)
The PySpark Shell (4:51)
Transformations and Actions (13:33)
See it in Action : Munging Airlines Data with PySpark - I (10:13)
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables (8:27)
Resilient Distributed Datasets
RDD Characteristics: Partitions and Immutability (12:35)
RDD Characteristics: Lineage, RDDs know where they came from (6:06)
What can you do with RDDs? (11:09)
Create your first RDD from a file (16:11)
Average distance travelled by a flight using map() and reduce() operations (5:50)
Get delayed flights using filter(), cache data using persist() (5:23)
Average flight delay in one-step using aggregate() (15:10)
Frequency histogram of delays using countByValue() (3:26)
See it in Action : Analyzing Airlines Data with PySpark - II (6:25)
Advanced RDDs: Pair Resilient Distributed Datasets
Special Transformations and Actions (14:45)
Average delay per airport, use reduceByKey(), mapValues() and join() (18:11)
Average delay per airport in one step using combineByKey() (11:53)
Get the top airports by delay using sortBy() (4:34)
Lookup airport descriptions using lookup(), collectAsMap(), broadcast() (14:03)
See it in Action : Analyzing Airlines Data with PySpark - III (4:58)
Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
Get information from individual processing nodes using accumulators (13:35)
See it in Action : Using an Accumulator variable (2:41)
Long running programs using spark-submit (5:58)
See it in Action : Running a Python script with Spark-Submit (3:58)
Behind the scenes: What happens when a Spark script runs? (14:30)
Running MapReduce operations (13:44)
See it in Action : MapReduce with Spark (2:05)
Introduction
A Brief Introduction to Storm (0:45)
Stream Processing with Storm
How does Twitter compute Trends? (5:44)
Improving Performance using Distributed Processing (5:41)
Building blocks of Storm Topologies (5:40)
Adding Parallelism in a Storm Topology (4:54)
Components of a Storm Cluster (4:08)
Implementing a Hello World Topology
A Simple Hello World Topology (4:13)
Ex 1: Implementing a Spout (11:10)
Ex 1: Implementing a Bolt (4:43)
Ex 1: Submitting the Topology (5:14)
Processing Data using Files
Ex 2: Reading Data from a File (11:38)
Representing Data using Tuples (3:26)
Ex 3: Accessing data from Tuples (9:07)
Ex 4: Writing Data to a File (9:58)
Running a Topology in the Remote Mode
Setting up a Storm Cluster (7:24)
Ex 5: Submitting a topology to the Storm Cluster (7:20)
Adding Parallelism to a Storm Topology
Ex 6 : Shuffle Grouping (6:42)
Ex 7: Fields Grouping (4:37)
Ex 8: All Grouping (2:22)
Ex 9: Custom Grouping (5:16)
Ex 10: Direct Grouping (5:39)
Section 7: Building a Word Count Topology
Ex 11: Building a Word Count Topology (10:04)
Remote Procedure Calls Using Storm
Ex 12: A Storm Topology for DRPC calls (12:48)
Managing Reliability of Topologies
Ex 13: Managing Failures in Spouts (10:32)
Integrating Storm with Different Sources/Sinks
Ex 14: Implementing a Twitter Spout (8:16)
Ex 15: Using a HDFS Bolt (7:17)
Using the Storm Multilang Protocol
Ex 16: Building a Storm Topology using Python (8:26)
Introduction
A Brief Introduction to Qlikview (0:33)
Getting Started
Understanding a Qlikview Document (7:09)
The In-Memory Data Model (6:49)
Installing the Qlikview Desktop Client (2:40)
Loading Data into a QV App
Loading data from a CSV file (14:03)
Loading data from a Database (9:06)
Avoiding Synthetic Keys (10:10)
Removing Circular References (5:19)
Exploring Data using the UI
List Boxes are like Select DISTINCT (5:40)
Table boxes are for Selecting columns (3:37)
Selection interactions in QV (8:01)
Summarizing data with Chart Boxes (15:42)
Data Types in QV : The Dual Format Representation (7:30)
Transforming Data in Load Scripts
Adding calculated fields in the load script (4:41)
Using a variable in the load script (3:48)
Joining tables in memory (3:45)
The Keep keyword (3:24)
Loading data from in-memory tables (5:01)
Inline loads (1:29)
Effectively presenting data
Some useful dashboard elements (9:46)
Grouped Fields (7:46)
Highlighting with Color (3:33)
The total keyword (6:06)
Using Set analysis to override selections (6:04)
Advanced Load Transformations
Mapping Loads (2:48)
Generic Load (4:38)
Appendix
MySQL Installation (7:03)
Special Transformations and Actions
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock