Apache Spark is downloaded as a .tgz file.Download Apache Spark 3.0.1
Open a new command prompt in administrator mode and run the following commands to set the environment variables used to locate Apache Spark:
setx HADOOP_HOME C:\bin\spark-3.0.1-bin-hadoop2.7\ setx SPARK_HOME C:\bin\spark-3.0.1-bin-hadoop2.7\ setx /M PATH "%PATH%;%HADOOP_HOME%;%SPARK_HOME%\bin"
Once you've installed everything and set your environment variables, open a new command prompt or terminal and run the following command:
If the command runs and prints version information, you can move to the next step.
If you receive a
'spark-submit' is not recognized as an internal or external command error, make sure you opened a new command prompt.
In your terminal, move to the folder that contains the file you just downloaded then run the following command:
mkdir ~/bin tar xvf spark-3.0.1-bin-hadoop2.7.tgz --directory ~/bin
Setup the required environent variables by running the following commands:
export SPARK_HOME=~/bin/spark-3.0.1-bin-hadoop2.7 export PATH="$SPARK_HOME/bin:$PATH" source ~/.bashrc
Run the following command: