Spark 2.3 on Kubernetes
Background¶
Introduction to Spark on Kubernetes
Running Spark on Kubernetes¶
Prerequisites:
- A runnable distribution of Spark 2.3 or above.
- A running Kubernetes cluster at version >= 1.6 with access configured to it using kubectl. If you do not already have a working Kubernetes cluster, you may setup a test cluster on your local machine using minikube. We recommend using the latest release of minikube with the DNS addon enabled.
- Be aware that the default minikube configuration is not enough for running Spark applications. We recommend 3 CPUs and 4g of memory to be able to start a simple Spark application with a single executor.
- You must have appropriate permissions to list, create, edit and delete pods in your cluster. You can verify that you can list these resources by running kubectl auth can-i
- pods.
The service account credentials used by the driver pods must be allowed to create pods, services and configmaps.
- You must have Kubernetes DNS configured in your cluster.
Steps¶
-
Need Kubernetes version 1.6 and above. To check the version, enter
kubectl version
. -
The cluster must be configured to use the kube-dns addon. Check with
- Start minikube with the recommended configuration for Spark
- Submit a Spark job using:
$ bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.container.image=<spark-image> \
local:///path/to/examples.jar
Use kubectl cluster-info
to get the K8s API server URL
Spark (starting with version 2.3) ships with a Dockerfile in the kubernetes/dockerfiles/
directory.
- Access logs:
- Accessing Driver UI:
Then go to https://localhost:4040