.NET for Apache Spark™ Tutorial - Get started in 10 minutes

Install Apache Spark

Download Apache Spark

Apache Spark is downloaded as a .tgz file.

Download Apache Spark 3.0.1

Extract Apache Spark

  1. Extract the nested .tar file:
    • Locate the spark-3.0.1-bin-hadoop2.7.tgz file that you downloaded.
    • Right click on the file and select 7-Zip > Extract here.
    • spark-3.0.1-bin-hadoop2.7.tar is created alongside the .tgz file you downloaded.
  2. Extract Apache Spark files:
    • Right click on spark-3.0.1-bin-hadoop2.7.tar and select 7-Zip > Extract files.
    • Enter C:\bin in the Extract to field.
    • Uncheck the checkbox below the Extract to field.
    • Select the OK button.
    • The Apache Spark files are extracted to C:\bin\spark-3.0.1-bin-hadoop2.7\

Open a new command prompt in administrator mode and run the following commands to set the environment variables used to locate Apache Spark:

Command prompt
setx HADOOP_HOME C:\bin\spark-3.0.1-bin-hadoop2.7\
setx SPARK_HOME C:\bin\spark-3.0.1-bin-hadoop2.7\

Once you've installed everything and set your environment variables, open a new command prompt or terminal and run the following command:

Command prompt
spark-submit --version

If the command runs and prints version information, you can move to the next step. If you receive a 'spark-submit' is not recognized as an internal or external command error, make sure you opened a new command prompt.

Extract Apache Spark

In your terminal, move to the folder that contains the file you just downloaded then run the following command:

mkdir ~/bin
tar xvf spark-3.0.1-bin-hadoop2.7.tgz --directory ~/bin

Setup the required environent variables by running the following commands:

export SPARK_HOME=~/bin/spark-3.0.1-bin-hadoop2.7
export PATH="$SPARK_HOME/bin:$PATH"
source ~/.bashrc

Run spark-shell

Run the following command: