Learn the latest Big Data Technology — Spark! And learn to use it with one of the most popular programming languages, Python!
One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!
Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!
This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we’ve done that we’ll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you’ll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!
We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion!
Who is this course for
- Someone who knows Python and would like to learn how to use it for Big Data.
- Someone who is very familiar with another programming language and needs to learn Spark.
Necessary preparation
- General Programming Skills in any Language (Preferrably Python).
- 20 GB of free space on your local computer (or alternatively a strong internet connection for AWS).
The Program
- Introduction to Course.
- Setting up Python with Spark.
- Databricks Setup.
- Local VirtualBox Set-up.
- AWS EC2 PySpark Set-up.
- AWS EMR Cluster Setup.
- Python Crash Course.
- Spark DataFrame Basics.
- Spark DataFrame Project Exercise.
- Introduction to Machine Learning with MLlib.
- Linear Regression.
- Logistic Regression.
- Decision Trees and Random Forests.
- K-means Clustering.
- Collaborative Filtering for Recommender Systems.
- Natural Language Processing.
- Spark Streaming with Python.
- Bonus.
What will you learn
- Use Python and Spark together to analyze Big Data.
- Work on Consulting Projects that mimic real world situations!
- Use Spark with Random Forests for Classification.
- Use Spark’s MLlib to create Powerful Machine Learning Models.
- Get set up on Amazon Web Services EC2 for Big Data Analysis.
- Learn how to leverage the power of Linux with a Spark Environment!
- Use Spark Streaming to Analyze Tweets in Real Time!
- Learn how to use the new Spark 2.0 DataFrame Syntax.
- Classify Customer Churn with Logisitic Regression.
- Learn how to use Spark’s Gradient Boosted Trees.
- Learn about the DataBricks Platform!
- Learn how to use AWS Elastic MapReduce Service!
- Create a Spam filter using Spark and Natural Language Processing!