Spark 1.4 increase maxResultSize memory
In SparkConf object, set spark.driver.maxResultSize parameter:
from pyspark import SparkConf, SparkContext # In Jupyter you have to stop the current context first sc.stop() # Create new config conf = (SparkConf() .set("spark.driver.maxResultSize", "2g")) # Create new context sc = SparkContext(conf=conf)
Here new SQLContext is created:
from pyspark.sql import SQLContext sqlContext = SQLContext(sc)
Here the same error is given by Spark bug https://issues.apache.org/jira/browse/SPARK-12837
serialized results of X tasks (Y MB) is bigger than spark.driver.maxResultSize
In this data is not pulled to the driver explicitly.
SPARK-12837 addresses a Spark bug that accumulators/broadcast variables prior to Spark 2 were pulled to driver and causing the issue.