What to set `SPARK_HOME` to ?

What to set `SPARK_HOME` to ?

Asked on January 21, 2019 in Apache-spark.
Add Comment


  • 13 Answer(s)

    Here the two environment variables are required to solve this issue:

    SPARK_HOME=/spark
    PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-VERSION-src.zip:$PYTHONPATH
    
    Answered on January 21, 2019.
    Add Comment

    Set SPARK_HOME

    export SPARK_HOME=/home/farmer/spark
    Set PYTHONPATH

    PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
    PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
    export PYTHONPATH

    Answered on January 23, 2019.
    Add Comment

    I am relatively new to Spark and Scala.

    I have a scala application that runs in local mode both on my windows box and a Centos cluster.

    As long as spark is in my classpath (i.e., pom.xml), spark runs as unit tests without the need for a SPARK_HOME. But then how do I set Spark properties such as spark.driver.memory?
    If I do have an instance of spark running locally, my unit test application seems to ignore it when in local mode. I do not see any output on the spark console suggesting it is using the spark instance I started from the command line (via spark-shell command). Am I mistaken? If not, how do I get my scala application to use that instance?

    Answered on January 27, 2019.
    Add Comment
    PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
    PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
    export PYTHONPATH
    Answered on January 28, 2019.
    Add Comment
    export SPARK_HOME=/home/farmer/spark
    Answered on January 30, 2019.
    Add Comment
    20

    6

    Installed apache-maven-3.3.3, scala 2.11.6, then ran:

    $ git clone git://github.com/apache/spark.git -b branch-1.4
    $ cd spark
    $ build/mvn -DskipTests clean package

    Finally:

    $ git clone https://github.com/apache/incubator-zeppelin
    $ cd incubator-zeppelin/
    $ mvn install -DskipTests

    Then ran the server:

    $ bin/zeppelin-daemon.sh start

    Running a simple notebook beginning with %pyspark, I got an error about py4j not being found. Just did pip install py4j (ref).

    Now I’m getting this error:

    pyspark is not responding Traceback (most recent call last):
      File "/tmp/zeppelin_pyspark.py", line 22, in <module>
        from pyspark.conf import SparkConf
    ImportError: No module named pyspark.conf

     

    Answered on January 30, 2019.
    Add Comment

    SPARK_HOME=/spark
    PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-VERSION-src.zip:$PYTHONPATH

    Answered on February 2, 2019.
    Add Comment

    You need to go where your Spark client is installed. Depending of your install/OS, it may be :

    1. /usr/hdp/current/sparkclient/sbin
    Answered on February 2, 2019.
    Add Comment

    Set the SPARK_HOME environment variable

    Set the SPARK_HOME environment variable. This slightly speeds up some operations, including the connection time.

    spark_home_set(path = NULL, ...)

    Arguments

    path A string containing the path to the installation location of Spark. If NULL, the path to the most latest Spark/Hadoop versions is used.
    Additional parameters not currently used.

    Value

    The function is mostly invoked for the side-effect of setting the SPARK_HOME environment variable. It also returns TRUE if the environment was successfully set, and FALSE otherwise.

    Examples

    # NOT RUN {
    # Not run due to side-effects
    spark_home_set()
    # }
    Answered on February 4, 2019.
    Add Comment

    You should install and set the SPARK_HOME variable, in unix terminal run the following code to set the variable:

    export SPARK_HOME=”/path/to/spark”
    To maintain this config you should append this to the end of your .bashrc.

    Answered on February 5, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.