In this series of posts, I take a fresh look at Apache Spark and investigate its applicability to a smaller problem (which in time may grow into a “true” big data problem). The companion Github project contains the sample code and installation instructions.
The series starts by introducing Spark and the bus time table case study.
This post describes how Spark is run on cluster. First locally and then on Amazon AWS.
January 25, 2016
This post describes how Spark can be used to extract and process data from the bus timing and weather data sources.
January 24, 2016
This post starts by showing how Spark is installed and set up. I then develop a simple test application.
January 23, 2016
This post describes what Spark is and why one might use it. It also describes the case study where Spark is applied.
January 22, 2016