簡體   English   中英

錯誤:您必須使用Hive構建Spark

[英]Error: You must build Spark with Hive

我正在使用Hive 0.13.1和Hadoop 2.6.0運行Spark 1.6.2。

我嘗試運行以下pyspark腳本:

import pyspark
from pyspark.sql import HiveContext

sc = pyspark.SparkContext('local[*]')
hc = HiveContext(sc)
hc.sql("select col from table limit 3")

使用以下命令行:

 ~/spark/bin/spark-submit script.py 

我收到此錯誤消息:

 File "/usr/local/hadoop/spark/python/pyspark/sql/context.py", line >552, in sql
 return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
 File "/usr/local/hadoop/spark/python/pyspark/sql/context.py", line >660, in _ssql_ctx
 "build/sbt assembly", e)
 Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while >calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject >id=o18))

按照他們的要求,我看到一條警告,說“不建議導出SPARK_HIVE”,而改用“ -Phive -Phive-thriftserver”,所以我這樣做了:

 cd ~/spark/
 build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver assembly

但我有稍微相同的錯誤:

 [...]
 16/07/17 19:10:01 WARN metadata.Hive: Failed to access metastore. This class should not accessed in runtime.
 org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate      org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
     at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)   
 [...]
 Traceback (most recent call last):
   File "/home/hadoop/spark3/./script.py", line 6, in <module>
     hc.sql("select timestats from logweb limit 3")
   File "/usr/local/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/context.py",      line 552, in sql
   File "/usr/local/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 660, in _ssql_ctx
 Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o19))

我在網上搜索了此錯誤,但如果答案對我有用,則沒有結果...

有人可以幫我嗎?


我還嘗試使用應該與Hadoop配合使用的spark版本 (由Joss建議),但出現此錯誤:

 Traceback (most recent call last):
 File "/home/hadoop/spark3/./script.py", line 6, in <module>
hc.sql("select timestats from logweb limit 3")
 File "/usr/local/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 552, in sql
 File "/usr/local/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 660, in _ssql_ctx
 Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o19))

我有一個默認情況下HiveContext附帶的Apache Spark版本,如果您有興趣,可以通過以下鏈接下載:

關於您遇到的問題,它可能與您用來編譯Spark的Hadoop版本有關。 檢查與所需的Hadoop版本相關的參數。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM