Would Spark unpersist the RDD itself when it realizes it won’t be used anymore ?
Here RDD is unpersist by the Apache Spark .
This can be seen in
In this reference is not strong to the RDD in a ReferenceQueue leading to
ContextCleaner.doCleanupRDD when the RDD is garbage collected:
In ContextCleaner more context can be seen in general and the commit that added it.
Make sure that when relying on garbage collection for unperisting RDDs:
Here RDDs resources are used on the executors, and the garbage collection happens on the driver. The RDD will not be automatically unpersisted until there is enough memory pressure on the driver, no matter how full the disk/memory of the executors gets.
In this part of an RDD we cannot unpersist (some partitions/records). When one persisted RDD is build from another, both will have to fit entirely on the executors at the same time.