Pig For Wrangling Big Data

Extract, Transform and Load data using Pig to harness the power of Hadoop

What's Inside

Course Description

Prerequisites: Working with Pig requires some basic knowledge of the SQL query language, a brief understanding of the Hadoop eco-system and MapReduce

Taught by a team which includes 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs.

Pig is aptly named, it is omnivorous, will consume any data that you throw at it and bring home the bacon!

Let's parse that

omnivorous: Pig works with unstructured data. It has many operations which are very SQL-like but Pig can perform these operations on data sets which have no fixed schema. Pig is great at wrestling data into a form which is clean and can be stored in a data warehouse for reporting and analysis.

bring home the bacon: Pig allows you to transform data in a way that makes is structured, predictable and useful, ready for consumption.

What's Covered:

Pig Basics: Scalar and Complex data types (Bags, Maps, Tuples), basic transformations such as Filter, Foreach, Load, Dump, Store, Distinct, Limit, Order by and other built-in functions.

Advanced Data Transformations and Optimizations: The mind-bending Nested Foreach, Joins and their optimizations using "parallel", "merge", "replicated" and other keywords, Co-groups and Semi-joins, debugging using Explain and Illustrate commands

Real-world example: Clean up server logs using Pig

Talk to us!

  • Mail us about anything - anything! - and we will always reply :-)

What are the requirements?

  • A basic understanding of SQL and working with data
  • A basic understanding of the Hadoop eco-system and MapReduce tasks

What am I going to get from this course?

  • Work with unstructured data to extract information, transform it and store it in a usable form
  • Write intermediate level Pig scripts to munge data
  • Optimize Pig operations which work on large data sets

What is the target audience?

  • Yep! Analysts who want to wrangle large, unstructured data into shape
  • Yep! Engineers who want to parse and extract useful information from large datasets

Course Curriculum

Get started now!



Certificate Available
2531+ Students
35 Lectures
5+ Hours of Video
Lifetime Access
24/7 Support
Instructor Rating
Loonycorn

Loonycorn is comprised of a couple of individuals —Janani Ravi and Vitthal Srinivasan—who have honed their tech expertises at Google and Stanford. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Popular Bundles