Hive on Spark CDH 5.7-无法创建Spark客户端

Question

We are getting the error while executing the hive queries with spark engine. 在使用spark引擎执行配置单元查询时，我们遇到了错误。

Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask 无法执行Spark任务，但异常'org.apache.hadoop.hive.ql.metadata.HiveException（无法创建spark客户端。）'失败：执行错误，从org.apache.hadoop.hive.ql返回代码1。 exec.spark.SparkTask

The following properties are set to use spark as the execution engine instead of mapreduce: 以下属性设置为将spark用作执行引擎，而不是mapreduce：

set hive.execution.engine=spark;
set spark.executor.memory=2g;

I tried changing the following properties also. 我也尝试更改以下属性。

 set yarn.scheduler.maximum-allocation-mb=2048;
    set yarn.nodemanager.resource.memory-mb=2048;
    set spark.executor.cores=4;
    set spark.executor.memory=4g;
    set spark.yarn.executor.memoryOverhead=750;
    set hive.spark.client.server.connect.timeout=900000ms;

Do I need to set some other properties? 我是否需要设置其他属性？ Can anyone suggest? 有人可以建议吗？

Answer 1

Seems like e YARN Container Memory was smaller than the Spark Executor requirement. 似乎e YARN容器内存小于Spark Executor的要求。 Please set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. 请将YARN容器内存和最大值设置为大于Spark Executor内存+开销。

yarn.scheduler.maximum-allocation-mb yarn.scheduler.maximum分配-MB
yarn.nodemanager.resource.memory-mb yarn.nodemanager.resource.memory-MB

yarn.nodemanager.resource.memory-mb: yarn.nodemanager.resource.memory-MB：

Amount of physical memory, in MB, that can be allocated for containers. 可分配给容器的物理内存量（以MB为单位）。 It means the amount of memory YARN can utilize on this node and therefore this property should be lower then the total memory of that machine. 这意味着YARN可以在此节点上使用的内存量，因此该属性应低于该计算机的总内存。

<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value> <!-- 40 GB -->

The next step is to provide YARN guidance on how to break up the total resources available into Containers. 下一步是就如何将可用的总资源分配到容器中提供YARN指导。 You do this by specifying the minimum unit of RAM to allocate for a Container. 通过指定要分配给容器的RAM的最小单位来执行此操作。

In yarn-site.xml 在yarn-site.xml中

<name>yarn.scheduler.minimum-allocation-mb</name> <!-- RAM-per-container ->
 <value>2048</value>

yarn.scheduler.maximum-allocation-mb: yarn.scheduler.maximum分配-MB：

It defines the maximum memory allocation available for a container in MB 它定义了一个容器可用的最大内存分配，以MB为单位

it means RM can only allocate memory to containers in increments of "yarn.scheduler.minimum-allocation-mb" and not exceed "yarn.scheduler.maximum-allocation-mb" and It should not be more then total allocated memory of the Node. 这意味着RM只能以“ yarn.scheduler.minimum-allocation-mb”为增量向容器分配内存，而不能超过“ yarn.scheduler.maximum-allocation-mb”，并且不应超过节点的总分配内存。

In yarn-site.xml 在yarn-site.xml中

<name>yarn.scheduler.maximum-allocation-mb</name> <!-Max RAM-per-container->
 <value>8192</value>

Also go to Spark History Server: goto Spark on YARN service Instance > History Server > History Service WebUI > Click on relevant job > Click on relevant Failed Job > Click on failed stages for that job and look for the "details" section. 也转到Spark历史记录服务器：在YARN服务实例>历史记录服务器> History Service WebUI上转到Spark>单击相关作业>单击相关的失败作业>单击该作业的失败阶段，然后查找“详细信息”部分。

Hive on Spark CDH 5.7-无法创建Spark客户端

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-10-26 15:44:42

Hive on Spark CDH 5.7-无法创建Spark客户端

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-10-26 15:44:42

解决方案1
1 已采纳 2017-10-26 15:44:42