Learn to code from scratch with the latest and greatest tools and techniques.
Enroll NowFrom Photoshop to After Effects, learn professional creative tools from the experts.
Enroll NowSnag unlimited access to 1,000+ courses for life — now just $99 with this deal!
View DealSpark is an open-source, distributed analytics engine which is very popular with developers, data analysts and scientists because of how easy and intuitive it is to use. Spark 2.x offers a variety of improvements in terms of performance, efficiency and developer APIs as compared with the original versions of Spark.
In addition to support for batch data, Spark also has extremely powerful support for continuous applications i.e. streaming data where the data is constantly updated and changes in real-time. The dataset is effectively infinitely increasing and Spark 2 dataframes allow you to work with these unbounded datasets in a natural and intuitive manner.
Here is what is covered in this course:
Streaming architectures: Understanding how to work with unbounded datasets
DStreams vs. Structured Streaming: Understanding how Spark 2 processes streams
Triggers and output modes: Determining when transformations are performed and how data sinks are updated
Grouping, aggregations on streams: Perform Spark transformations on continuous data
Sliding and tumbling windows: Partition streams using windows to perform aggregations
Timestamps, watermarks and late data: Learn to work with event time, ingestion time and processing time
Streaming data from Twitter: Perform analysis on real-world streams
Joins and Windowed joins: Perform join operations on batches and streams
Kafka integration: Connect Spark with Kafka to consume tweets and perform analysis
This course is built around hands on demos using datasets from the real world. You'll be analyzing data from restaurants listed on Zomato and real-time Twitter data!
At the end of this course you will comfortable working on big data analysis on streaming data from multiple sources using Spark 2.
Software used: Spark 2.3, Python 3
Loonycorn is comprised of a couple of individuals —Janani Ravi and Vitthal Srinivasan—who have honed their tech expertises at Google and Stanford. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.