Would Spark unpersist the RDD itself when it realizes it won’t be used anymore ?

Would Spark unpersist the RDD itself when it realizes it won’t be used anymore ?

Add Comment


  • 1 Answer(s)

    Here RDD is unpersist by the Apache Spark .

    This can be seen in RDD.persist.

    sc.cleaner.foreach(_.registerRDDForCleanup(this))
    
    

    In this reference is not strong to the RDD in a ReferenceQueue leading to ContextCleaner.doCleanupRDD when the RDD is garbage collected:

    sc.unpersistRDD(rddId, blocking)
    

    In ContextCleaner  more context can be seen in general and the commit that added it.

    Make sure that when relying on garbage collection for unperisting RDDs:

    Here RDDs resources are used on the executors, and the garbage collection happens on the driver. The RDD will not be automatically unpersisted until there is enough memory pressure on the driver, no matter how full the disk/memory of the executors gets.

    In this part of an RDD we cannot unpersist (some partitions/records). When one persisted RDD is build from another, both will have to fit entirely on the executors at the same time.

    Answered on January 5, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.