简体繁体 English

在Spark上进行配置时，在Spark Master Web UI中的作业应用程序中总是错误的executor_cores

[英]hive on spark, always wrong executor_cores in job application from spark master web UI

原文 2017-05-02 04:30:33 6 1 apache-spark/ docker/ hive

I am trying to switch hive 2.1.1 on mapreduce to hive on spark. 我正在尝试将mapreduce上的hive 2.1.1切换为spark上的hive。 As told in hive on spark official site, i build a spark 1.6.0 (as to spark rev in hive 2.1.1 source code POM) without hive. 正如在Hive官方网站上的Hive中所说的那样，我没有spark 1.6.0构建了一个spark 1.6.0 （关于Hive 2.1.1源代码POM中的Spark Rev）。 The Spark is working fine with a spark- submit/spark-shell test. Spark通过submit/spark-shell测试工作正常。 I set the 我设定

spark.executor.cores/spark.executor.memory spark.executor.cores / spark.executor.memory

in hive-site.xml , also limit these 2 by 在hive-site.xml ，还将这两个限制为

SPARK_WORKER_CORES/SPARK_WORKER_MEMORY SPARK_WORKER_CORES / SPARK_WORKER_MEMORY

in spark-env.sh . 在spark-env.sh 。 But after i start a hive query like select count(*) from hive cli, the job in spark master web UI is always with 0 CPU cores applied, so the job is not executed and hive query waits like for ever in cli. 但是在我从hive cli中启动诸如select count(*)类的hive查询后，spark master Web UI中的作业始终应用了0个CPU内核，因此该作业没有执行，并且hive查询像在cli中一样一直等待。 And spark cluster is set up on a docker environment on that each server is a docker container running on a server with added up to 160 cores/160g memory . 并且spark集群是在docker环境中设置的，每个服务器都是在服务器上运行的docker容器，最多可增加160个核心/ 160g内存 。 Before i set SPARK_WORKER_CORES/SPARK_WORKER_MEMORY , always 156 cores are applied which also leads to failure without enough resource. 在我设置SPARK_WORKER_CORES / SPARK_WORKER_MEMORY之前，始终应用156个内核，这也会导致失败而导致资源不足。 After i set SPARK_WORKER_CORES/SPARK_WORKER_MEMORY limitted to resource assigned to the docker container, 0 is applied. 在我将SPARK_WORKER_CORES / SPARK_WORKER_MEMORY设置为仅限分配给Docker容器的资源后，将应用0。

i have been stuck on the problem 2 days without progress. 我被困在问题上2天没有进展。 hope some tips from anyone who is familiar with hive on docker or run hive/spark on a docker env. 希望从熟悉docker上的hive或在docker env上运行hive / spark的任何人那里获得一些提示。

1 个解决方案

I dont think spark execution engine works well with hive at all. 我认为Spark执行引擎根本不能与Hive一起使用。 The hive version you are trying integrate with spark , is built with spark 2.0.0 and not 1.6.0 There has been lot of discussion on this before . 您正在尝试与spark集成的配置单元版本是由spark 2.0.0而非1.6.0构建的。以前对此进行了很多讨论。 See the thread here You are better off with using Tez as many of of users reports on that thread. 在此处查看线程。使用Tez更好，因为许多用户对该线程进行报告。