简体繁体中英

how to set spark RDD StorageLevel in hive on spark?

原文 2016-01-16 03:28:39 5 1 hadoop/ apache-spark/ hive/ hiveql

In my hive on spark job , I get this error :

org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0

thanks for this answer ( Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode? ) , I know it may be my hiveonspark job has the same problem

since hive translates sql to a hiveonspark job, I don't how to set it in hive to make its hiveonspark job change from StorageLevel.MEMORY_ONLY to StorageLevel.MEMORY_AND_DISK ?

thanks for you help~~~~

1 answers

You can use CACHE/UNCACHE [LAZY] Table <table_name> to manage caching. More details .

If you are using DataFrame's then you can use the persist(...) to specify the StorageLevel. Look at API here. .

In addition to setting the storage level, you can optimize other things as well. SparkSQL uses a different caching mechanism called Columnar storage which is a more efficient way of caching data (as SparkSQL is schema aware). There are different set of config properties that can be tuned to manage caching as described in detail here (THis is latest version documentation. Refer to the documentation of version you are using).

spark.sql.inMemoryColumnarStorage.compressed
spark.sql.inMemoryColumnarStorage.batchSize

Save Spark RDD to Hive Table

How to generate a large data set using hive / spark-sql?

how to set up hive database connection inside spark

How to create RDD from memory of Slaves in Spark?

How to cancel Spark Hadoop RDD computation

Spark: scala - how to convert collection from RDD to another RDD

spark: how to zip an RDD with each partition of the other RDD

How to configure Hive to use Spark?

spark - extract elements from an RDD[Row] when reading Hive table in Spark

Apache Spark RDD

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Save Spark RDD to Hive Table How to generate a large data set using hive / spark-sql? how to set up hive database connection inside spark How to create RDD from memory of Slaves in Spark? How to cancel Spark Hadoop RDD computation Spark: scala - how to convert collection from RDD to another RDD spark: how to zip an RDD with each partition of the other RDD How to configure Hive to use Spark? spark - extract elements from an RDD[Row] when reading Hive table in Spark Apache Spark RDD

Related Tags

how to set spark RDD StorageLevel in hive on spark?

Question

1 answers

solution1 1 2016-01-17 00:18:21

solution1
1 2016-01-17 00:18:21