How to Install Spark 3 on Windows 10

4 min readMar 10, 2021

I have been using spark for a long time. It is an excellent, distributed computation framework. I use this regularly at work, and I also have it installed on my local desktop and laptop.

This document is to show the installation steps for installing spark 3+ on Windows 10 in a sudo distributed mode.

Steps:-

1. Install WSL2

a. https://docs.microsoft.com/en-us/windows/wsl/install-win10

2. Install Ubuntu 20.4 LTS from the Microsoft store.

3. Install the windows terminal form the Microsoft store. This step is optional. You can use PowerShell or MobaXterm

4. Fire up the Ubuntu from WSL

open Ubuntu terminal from windows terminal

5. Once logged in, then go to home dir “ cd ~ ”

6. For spark, we need

a. Python3

b. Java

c. Latest Scala

d. Spark with Hadoop binary zip file

Let’s download and install all the prerequisite
install python

sudo apt-get install software-properties-common
sudo apt-get install python-software-properties

install Java (open JDK)

sudo apt-get install openjdk-8-jdk

Check the java and javac version

java -version
javac -version

Install Scala
get scala binary for Unix

wget https://downloads.lightbend.com/scala/2.13.3/scala-2.13.3.tgz
tar xvf scala-2.13.3.tgz

edit bashrc file to add Scala

vi ~/.bashrc

add these lines in the end

export SCALA_HOME=Path-where-scala-file-is-located#/root/scala-2.13.3
export PATH=$PATH:$SCALA_HOME/bin

Once done, save and close the file
Let's check the scala version

source ~/.bashrc
scala -version

get the spark package
I downloaded the spark from the source

wget “https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz”
tar xvf spark-3.1.1-bin-hadoop3.2.tgz
vi ~/.bashrc
export SPARK_HOME=”/home/sandipan/spark-3.1.1-bin-hadoop3.2"
export PATH=$PATH:$SPARK_HOME/bin

Once done, save and close the file, then run the below command to load the profile

source ~/.bashrc

Start Spark Services

cd $SPARK_HOME

start the master Server

./sbin/start-master.sh

once you start the master server you will get a message saying it had started.
you can see the spark status in the web console of master using local http://localhost:8080
there you will see the master url

Mine looks like:- “spark://LAPTOP-7DUT93OF.localdomain:7077”
we can start a slave using bellow command

SPARK_WORKER_INSTANCES=3 SPARK_WORKER_CORES=2 SPARK_WORKER_MEMORY=7G ./sbin/start-worker.sh spark://LAPTOP-7DUT93OF.localdomain:7077

SPARK_WORKER_INSTANCES = how many worker instances you want to start
SPARK_WORKER_CORES = how many cores per instances you want to give. Generally, I give 1 core.
SPARK_WORKER_MEMORY = Memory per worker. Be very careful with this parameter. My laptop has 32 GB of memory, so I keep 3GB to 4GB for Windows, 2 GB for the driver program and the rest for the worker node.

open the pyspark shell

SPARK_HOME/bin/pyspark — master spark://LAPTOP-7DUT93OF.localdomain:7077 — executor-memory 6500mb
SPARK_HOME/bin/spark-shell — master spark://LAPTOP-7DUT93OF.localdomain:7077 — executor-memory 6500mb