Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Pig For Wrangling Big Data
You, This Course and Us
You, This Course and Us (1:46)
Where does Pig fit in?
Pig and the Hadoop ecosystem (9:37)
Install and set up (8:50)
How does Pig compare with Hive? (10:15)
Pig Latin as a data flow language (6:17)
Pig with HBase (5:18)
Pig Basics
Operating modes, running a Pig script, the Grunt shell (9:52)
Loading data and creating our first relation (8:45)
Scalar data types (9:55)
Complex data types - The Tuple, Bag and Map (13:45)
Partial schema specification for relations (10:00)
Displaying and storing relations - The dump and store commands
Pig Operations And Data Transformations
Selecting fields from a relation (10:22)
Built-in functions (5:08)
Evaluation functions (10:31)
Using the distinct, limit and order by keywords (5:04)
Filtering records based on a predicate (11:01)
Advanced Data Transformations
Group by and aggregate transformations (12:12)
Combining datasets using Join (16:19)
Concatenating datasets using Union (4:32)
Generating multiple records by flattening complex fields (5:24)
Using Co-Group, Semi-Join and Sampling records (9:26)
The nested Foreach command (13:47)
Debug Pig scripts using Explain and Illustrate (12:55)
Optimizing Data Transformations
Parallelize operations using the Parallel keyword (8:02)
Join Optimizations: Multiple relations join, large and small relation join (10:34)
Join Optimizations: Skew join and sort-merge join (8:51)
Common sense optimizations (5:25)
A real-world example
Parsing server logs (7:55)
Summarizing error logs (8:47)
Installing Hadoop in a Local Environment
Hadoop Install Modes (8:32)
Setup a Virtual Linux Instance (For Windows users) (15:31)
Hadoop Standalone mode Install (9:33)
Hadoop Pseudo-Distributed mode Install (14:25)
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables (8:25)
Hadoop Pseudo-Distributed mode Install
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock