Spark 1.4 increase maxResultSize memory

Spark 1.4 increase maxResultSize memory

Asked on December 31, 2018 in Apache-spark.
Add Comment


  • 3 Answer(s)

    In SparkConf object, set spark.driver.maxResultSize parameter:

    from pyspark import SparkConf, SparkContext
     
    # In Jupyter you have to stop the current context first
    sc.stop()
     
    # Create new config
    conf = (SparkConf()
        .set("spark.driver.maxResultSize", "2g"))
     
    # Create new context
    sc = SparkContext(conf=conf)
    

    Here new SQLContext is created:

    from pyspark.sql import SQLContext
    sqlContext = SQLContext(sc)
    
    Answered on December 31, 2018.
    Add Comment

    For increasing the max result size,  pyspark, –conf spark.driver.maxResultSize=3g can also be used.

    Answered on December 31, 2018.
    Add Comment

    Here the same error is given by Spark bug https://issues.apache.org/jira/browse/SPARK-12837 

    serialized results of X tasks (Y MB) is bigger than spark.driver.maxResultSize
    

    In this data is not pulled to the driver explicitly.

    SPARK-12837 addresses a Spark bug that accumulators/broadcast variables prior to Spark 2 were pulled to driver and causing the issue.

    Answered on December 31, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.