Drop spark dataframe from cache

Question

I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution;

df1.cache()
df2.cache()

Once use of certain dataframe is over and is no longer needed how can I drop DF from memory (or un-cache it??)?

For example, df1 is used through out the code while df2 is utilized for few transformations and after that, it is never needed. I want to forcefully drop df2 to release more memory space.

Answer 1

just do the following:

df1.unpersist()
df2.unpersist()

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

Answer 2

If the dataframe registered as a table for SQL operations , like

df.createGlobalTempView(tableName) // or some other way as per spark verision

then the cache can be dropped with following commands, off-course spark also does it automatically

Spark >= 2.x

Here spark is an object of SparkSession

Drop a specific table/df from cache
 spark.catalog.uncacheTable(tableName)
Drop all tables/dfs from cache
 spark.catalog.clearCache()

Spark <= 1.6.x

Drop a specific table/df from cache
 sqlContext.uncacheTable(tableName)
Drop all tables/dfs from cache
 sqlContext.clearCache()

Answer 3

If you need to block during removal => df2.unpersist(true)
Unblocking removal => df2.unpersist()

Answer 4

Here is a simple utility context manager that takes care of that for you:

@contextlib.contextmanager
def cached(df):
    df_cached = df.cache()
    try:
        yield df_cached
    finally:
        df_cached.unpersist()

Drop spark dataframe from cache

Question

4 answers

solution1
67 2015-08-26 06:00:07

solution2
34 2018-05-21 04:21:14

Spark >= 2.x

Spark <= 1.6.x

solution3
0 2021-04-08 14:23:28

solution4
0 2022-08-05 14:16:33

Drop spark dataframe from cache

Question

4 answers

solution1 67 2015-08-26 06:00:07

solution2 34 2018-05-21 04:21:14

Spark >= 2.x

Spark <= 1.6.x

solution3 0 2021-04-08 14:23:28

solution4 0 2022-08-05 14:16:33

solution1
67 2015-08-26 06:00:07

solution2
34 2018-05-21 04:21:14

solution3
0 2021-04-08 14:23:28

solution4
0 2022-08-05 14:16:33