简体   繁体   English

SparkSession.catalog.clearCache() 可以从 hdfs 中删除数据吗?

[英]Can SparkSession.catalog.clearCache() delete data from hdfs?

I am experiencing some data deletion issue since we have migrated from CDH to HDP (spark 2.2 to 2.3).自从我们从 CDH 迁移到 HDP(spark 2.2 到 2.3)后,我遇到了一些数据删除问题。 The tables are being read from an hdfs location and after a certain time running spark job that reads and processes those tables, it throws table not found exception and when we check that location all the records are vanished.这些表是从 hdfs 位置读取的,在运行读取和处理这些表的 spark 作业一段时间后,它会抛出table not found 异常,当我们检查该位置时,所有记录都消失了。 In my spark(Java) code I see before that table is read, clearCache() is called.在读取该表之前我看到的 spark(Java) 代码中,调用了 clearCache()。 Can it delete those files?它可以删除那些文件吗? If yes, how do I fix it?如果是,我该如何解决?

I think, you should look at the source code - Spark has their own implementation of caching user data and they never delete the same while managing this cache via CacheManager.我认为,您应该查看源代码- Spark 有自己的缓存用户数据实现,并且在通过 CacheManager 管理此缓存时,他们从不删除相同的数据。 Have alook看一看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM