在spark上配置配置单元的问题

Question

I have downloaded spark-2.0.0-bin-hadoop2.7. 我已经下载了spark-2.0.0-bin-hadoop2.7。 Could any one advise how to configure hive on this and use in scala console? 任何人都可以建议如何配置hive并在scala控制台中使用？ Now I am able to run RDD's on file using Scala (spark-shell console). 现在我可以使用Scala（spark-shell控制台）在文件上运行RDD。

Answer 1

将您的hive-site.xml放在spark conf目录中

Answer 2

Hive can support multiple execution engine. Hive可以支持多个执行引擎。 Like TEZ, Spark. 像TEZ，Spark一样。 You can set the property in hive-site.xml 您可以在hive-site.xml中设置该属性

</property> 
<name>hive.execution.engine</name>
<value>spark</value>
<description>
 I am choosing Spark as the execution engine
</description>
</property>

Copy jars spark-assembly jar to HIVE_HOME/lib 将jar-spark-assembly jar复制到HIVE_HOME / lib

Set the spark_home 设置spark_home

set the below properties 设置以下属性

set spark.master=<Spark Master URL>
set spark.eventLog.enabled=true;
set spark.eventLog.dir=<Spark event log folder (must exist)>
set spark.executor.memory=512m;             
set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Above steps would suffice i think 我认为上面的步骤就足够了

Answer 3

Follow the official Hive on Spark documentation: 关注官方Hive on Spark文档：

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

You can set on Hive the spark engine by using the following command: 您可以使用以下命令在Hive上设置Spark引擎：

set hive.execution.engine=spark;

or by adding it on hive-site.xml (refer to kanishka post) 或者在hive-site.xml上添加它（参考kanishka帖子）

Then prior to Hive 2.2.0, copy the spark-assembly jar to HIVE_HOME/lib. 然后在Hive 2.2.0之前，将spark-assembly jar复制到HIVE_HOME / lib。

Since Hive 2.2.0, Hive on Spark runs with Spark 2.0.0 and above, which doesn't have an assembly jar. 从Hive 2.2.0开始，Spark上的Hive运行Spark 2.0.0及更高版本，它没有装配jar。

To run with YARN mode (either yarn-client or yarn-cluster), copy the following jars to HIVE_HOME/lib. 要以YARN模式（yarn-client或yarn-cluster）运行，请将以下jar复制到HIVE_HOME / lib。

scala-library 斯卡拉库

spark-core 火花核心

spark-network-common 火花网络共同

Set the spark_home: 设置spark_home：

export $SPARK_HOME=/path-to-spark

Start Spark Master and Workers: 启动Spark Master和Workers：

spark-class org.apache.spark.deploy.master.Master

spark-class org.apache.spark.deploy.worker.Worker spark://MASTER_IP:PORT

Configure Spark: 配置Spark：

set spark.master=<Spark Master URL>;
set spark.executor.memory=512m; 
set spark.yarn.executor.memoryOverhead=10~20% of spark.executor.memory(value);     
set spark.serializer=org.apache.spark.serializer.KryoSerializer;

在spark上配置配置单元的问题

问题描述

3 个解决方案

解决方案1
0 2016-09-27 06:08:54

解决方案2
0 2016-09-27 06:41:44

解决方案3
0 已采纳 2017-05-11 20:05:29

在spark上配置配置单元的问题

问题描述

3 个解决方案

解决方案1 0 2016-09-27 06:08:54

解决方案2 0 2016-09-27 06:41:44

解决方案3 0 已采纳 2017-05-11 20:05:29

解决方案1
0 2016-09-27 06:08:54

解决方案2
0 2016-09-27 06:41:44

解决方案3
0 已采纳 2017-05-11 20:05:29