Spark is all the rage at the moment (and has been for a while) in the Big Data and Analytics communities, seeing application for all aspects of working with data, from Streaming to Data Science. It offers a very performant, multi-purpose scalable platform with a very strong user community. In this series I’ll be looking at setting Spark up on a local machine for learning purposes, working with the Jupyter notebook environment for data wrangling, mungeing and visualisation. We’ll also take a quick look at cloud platform offerings and some of the basics of the extensions to the core Spark platform such as Spark Structured Streaming and Spark ML. Spark is a very large subject and I won’t be going into too much depth, just enough to give readers a taster for capabilities and ease of use. There are some fantastics sources of information out there in the Spark community for those interested in a deeper understanding, which I’ll provide references to along the way.
Part 1: Installing Spark on Windows
Part 3: Installing Jupyter Notebook Kernels
Part 4: Spark on Azure HDInsight
Part 5: Spark on Azure Databricks
Part 6: Spark Core
Part 7: Spark SQL
Part 8: Spark Structured Streaming
Part 9: Spark ML
Part 10: Spark GraphX