使用 Spark 執行引擎時如何配置 Hive cli？

Question

我已將hive.execution.engine設置為 spark 並且正在使用啟用 spark 的隊列。 Spark sql能夠訪問蜂巢表-因此是beeline從直接連接的簇機。

但是hive cli似乎需要額外的步驟。 到目前為止，已經完成了以下工作：

** 將scala庫復制到$HIVE_HOME/libs目錄（或者我們得到ClassNotFoundException ）

** 在hive腳本的開頭（或在.hiverc ）運行以下.hiverc

set hive.execution.engine=spark;
set mapred.job.queue.name=root.spark.sbg.hos;

但是現在發生以下錯誤Failed to create spark client. ：

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
hive (default)> insert into sb.test2 values (1,'ab');
Query ID = sboesch_20171030175629_dc310c9a-519e-4f84-a632-f3a44f1df8c3
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

有沒有人成功地連接到spark后端的hive ？ 我通過 vanilla hive連接（不是Cloudera或Hortonworks或MapR ）。

Answer 1

您必須單獨啟動 Hive Metastore Server 才能通過 Spark 訪問 Hive 表。

在新終端中嘗試hive --service metastore ，您將得到類似“ Starting Hive Metastore Server的響應

hive-site.xml

`<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>   
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>**mysql metastore username**</value>   
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>**mysql metastore DB password**</value>   
</property>

<property>
<name>hive.querylog.location</name>
<value>/tmp/hivequerylogs/${user.name}</value>    
</property>

<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/apache-hive-2.1.1-bin/lib/hive-hbase-handler-2.1.1.jar,file:///usr/local/hive/apache-hive-2.1.1-bin/lib/zookeeper-3.4.6.jar</value>
<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>
</property>

<property>
<name>hive.support.concurrency</name>
<value>false</value>   
</property>

<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>    
</property>

<property>
<name>hive.server2.authentication</name>
<value>PAM</value>    
</property>

 <property>
<name>hive.server2.custom.authentication.class</name>
<value>org.apache.hive.service.auth.PamAuthenticationProvider</value>  
</property>

<property>
<name>hive.server2.authentication.pam.services</name>
<value>sshd,sudo</value>    
</property>

<property>
<name>hive.stats.dbclass</name>
<value>jdbc:mysql</value>    
</property>

<property>
<name>hive.stats.jdbcdriver</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>hive.session.history.enabled</name>
<value>true</value>
</property>  

<property>
 <name>hive.metastore.schema.verification</name>
 <value>false</value>    
</property>

 <property>
 <name>hive.optimize.sort.dynamic.partition</name>
 <value>false</value>    
 </property>

 <property>
   <name>hive.optimize.insert.dest.volume</name>
   <value>false</value>
 </property>

 <property>
 <name>hive.exec.scratchdir</name>
 <value>/tmp/hive/${user.name}</value>
 <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
 </property>   

  <property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
  <description/>
  </property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
 </property>

 <property>
 <name>datanucleus.schema.autoCreateAll</name>
 <value>true</value>
 </property>

<property>
<name>datanucleus.schema.validateConstraints</name>
<value>true</value>
</property>

  <property>
  <name>datanucleus.schema.validateColumns</name>
  <value>true</value>
  </property>

  <property>
    <name>datanucleus.schema.validateTables</name>
  <value>true</value>
  </property>
</configuration>`

使用 Spark 執行引擎時如何配置 Hive cli？

問題描述

1 個解決方案

解決方案1
1 2017-10-31 05:50:57

使用 Spark 執行引擎時如何配置 Hive cli？

問題描述

1 個解決方案

解決方案1 1 2017-10-31 05:50:57

解決方案1
1 2017-10-31 05:50:57