This week we are looking at a container for Apache Spark. Spark is a cluster-computing framework for data processing, in particular MapReduce and more recently machine learning, graph analysis and streaming analytics. Clustered systems are sometimes difficult to run on a single machine, for example a laptop or desktop, as this use case is often not given a high priority by developers. Luckily, there is the gettyimages/spark image available for those who wish to quickly and easily explore the Spark environment.
Download the gettyimages/spark image using docker pull. Since it is a JVM-based project, the container image is quite large – 715 MB. To execute a standalone version of the Spark shell inside a container run the following command:
$ docker run --rm -it -p 4040:4040 \ gettyimages/spark bin/spark-shell
The docker run command brings up a Spark shell running on standard input and the Spark shell application UI is also exposed as a web interface. Different aspects of the Spark environment can be viewed using the UI. Point your browser at http://localhost:4040 and have a look around.
Running Spark in a single container is handy but if you want to try out a clustered installation gettyimages/spark comes with a Docker compose file. You can use this to try out a Spark cluster consisting of container images. Note that you will need to create a clone of the image’s source git repository to get the compose file in addition the container image.
$ git clone \ https://github.com/gettyimages/docker-spark.git $ cd docker-spark $ docker-compose up
This setup creates a two-node cluster with a master and a single worker both running as containers.
To connect to the master we can an interactive version of gettyimages/spark using a similar command line to the standalone version above:
$ docker run --rm -it gettyimages/spark bin/spark-shell \ spark://$DOCKER_IP:7077
Use the address of your Docker server for $DOCKER_IP which will either be localhost if you are running Docker locally, or your Docker bridge IP address. For Docker for Mac this is 172.17.0.1.