简体   繁体   English

如何让鲨鱼/火花清除缓存?

[英]How to make shark/spark clear the cache?

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.当我运行我的鲨鱼查询时,内存被囤积在主内存中这是我的顶级命令结果。


Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached内存:总共 74237344k,已使用 70080492k,可用 4156852k,399544k 缓冲区交换:总共 4194288k,已使用 480k,可用 4193808k,已缓存 65965904k


this doesn't change even if i kill/stop shark,spark, hadoop processes.即使我杀死/停止鲨鱼、火花、Hadoop 进程,这也不会改变。 Right now, the only way to clear the cache is to reboot the machine.现在,清除缓存的唯一方法是重新启动机器。

has anyone faced this issue before?有没有人遇到过这个问题? is it some configuration problem or a known issue in spark/shark?是一些配置问题还是 spark/shark 中的已知问题?

To remove all cached data:删除所有缓存数据:

sqlContext.clearCache()

Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html来源: https : //spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html

If you want to remove an specific Dataframe from cache:如果要从缓存中删除特定的 Dataframe:

df.unpersist()

Are you using the cache() method to persist RDDs?您是否使用cache()方法来持久化 RDD?

cache() just calls persist() , so to remove the cache for an RDD, call unpersist() . cache()只是调用persist() ,因此要删除 RDD 的缓存,请调用unpersist()

This is weird.这很奇怪。 The questions asked has nothing to do the answers.问的问题与答案无关。 The cache OP posted is owned by operation system and has nothing to do with spark.发布的缓存OP归操作系统所有,与spark无关。 It is an optimization of the OS and we shouldn't be worried about that particular cache.这是操作系统的优化,我们不应该担心那个特定的缓存。

And spark cache is usually in memory, but that will be in the RSS section, not the cache section of the OS. Spark 缓存通常在内存中,但这将在 RSS 部分,而不是操作系统的缓存部分。

I followed this one and it worked fine for me ::我遵循了这个,对我来说效果很好::

for ((k,v) <- sc.getPersistentRDDs) {
   v.unpersist()
}

sc.getPersistentRDDs is a Map which stores the details of the cached data. sc.getPersistentRDDs 是一个 Map,用于存储缓存数据的详细信息。

scala> sc.getPersistentRDDs Scala> sc.getPersistentRDDs

res48: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map() res48: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

The solution proposed:提出的解决方案:

sqlContext.clearCache()

gave me an error and I had to use this one instead:给了我一个错误,我不得不改用这个:

sqlContext.catalog.clearCache()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM