Advanced MapReduce in Hadoop

A hands-on workout in Hadoop, MapReduce and the art of thinking "parallel"

What's Inside

Course Description

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data.

This course digs deep into all the features that Hadoop provides for controlling and customizing your MapReduce job to a very granular level.

What's Covered:

Lot's of cool stuff ..

  • MapReduce : Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort
  • Using MapReduce to
    • Build an Inverted Index for Search Engines: Use MapReduce to parallelize the humongous task of building an inverted index for a search engine.
    • Generate Bigrams from text: Generate bigrams and compute their frequency distribution in a corpus of text.
  • Customize your MapReduce Jobs:
    • Chain multiple MR jobs together
    • Write your own Customized Partitioner
    • Total Sort : Globally sort a large amount of data by sampling input files
    • Secondary sorting
    • Unit tests with MR Unit
    • Integrate with Python using the Hadoop Streaming API

Mail us about anything - anything! - and we will always reply :-)

What are the requirements?

  • You'll need an IDE where you can write Java code or open the source code that's shared. IntelliJ and Eclipse are both great options.
  • You'll need some background in Object-Oriented Programming, preferably in Java. All the source code is in Java and we dive right in without going into Objects, Classes etc
  • A bit of exposure to Linux/Unix shells would be helpful, but it won't be a blocker

What am I going to get from this course?

  • Develop advanced MapReduce applications to process BigData
  • Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices
  • Implement advanced tasks like custom partitioning, total sort and secondary sort

What is the target audience?

  • Yep! Analysts who want to leverage the power of HDFS where traditional databases don't cut it anymore
  • Yep! Engineers who want to develop complex distributed computing applications to process lot's of data
  • Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data

Course Curriculum

Get started now!



Certificate Available
308+ Students
24 Lectures
4+ Hours of Video
Lifetime Access
24/7 Support
Instructor Rating
Loonycorn

Loonycorn is comprised of a couple of individuals —Janani Ravi and Vitthal Srinivasan—who have honed their tech expertises at Google and Stanford. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Popular Bundles