简体   繁体   English

如何在Spark配置单元中设置Spark RDD StorageLevel?

[英]how to set spark RDD StorageLevel in hive on spark?

In my hive on spark job , I get this error : 在我的蜂巢火花工作中,出现以下错误:

org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 org.apache.spark.shuffle.MetadataFetchFailedException:缺少shuffle 0的输出位置

thanks for this answer ( Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode? ) , I know it may be my hiveonspark job has the same problem 感谢您的回答( 为什么Spark作业会因org.apache.spark.shuffle.MetadataFetchFailedException失败:在推测模式下缺少shuffle 0的输出位置? ),我知道这可能是我的hiveonspark作业具有相同的问题

since hive translates sql to a hiveonspark job, I don't how to set it in hive to make its hiveonspark job change from StorageLevel.MEMORY_ONLY to StorageLevel.MEMORY_AND_DISK ? 由于hive将sql转换为hiveonspark作业,因此我不打算在hive中设置它以使其hiveonspark作业从StorageLevel.MEMORY_ONLY更改为StorageLevel.MEMORY_AND_DISK吗?

thanks for you help~~~~ 谢谢你的帮助~~~~

You can use CACHE/UNCACHE [LAZY] Table <table_name> to manage caching. 您可以使用CACHE/UNCACHE [LAZY] Table <table_name>来管理缓存。 More details . 更多细节

If you are using DataFrame's then you can use the persist(...) to specify the StorageLevel. 如果使用的是DataFrame,则可以使用persist(...)指定StorageLevel。 Look at API here. 这里查看API。 .

In addition to setting the storage level, you can optimize other things as well. 除了设置存储级别,您还可以优化其他内容。 SparkSQL uses a different caching mechanism called Columnar storage which is a more efficient way of caching data (as SparkSQL is schema aware). SparkSQL使用称为Columnar存储的另一种缓存机制,这是一种更高效的数据缓存方式(因为SparkSQL支持模式)。 There are different set of config properties that can be tuned to manage caching as described in detail here (THis is latest version documentation. Refer to the documentation of version you are using). 可以调整一组不同的配置属性来管理缓存,如此处详细描述(这是最新版本的文档。请参阅您使用的版本的文档)。

  • spark.sql.inMemoryColumnarStorage.compressed spark.sql.inMemoryColumnarStorage.compressed
  • spark.sql.inMemoryColumnarStorage.batchSize spark.sql.inMemoryColumnarStorage.batchSize

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM