Tez在Hadoop-2.5.2集群上崩溃

Question

I successfully Build Tez-0.6.0 against Hadoop-2.5.2 我针对Hadoop-2.5.2成功构建了Tez-0.6.0

Then I configured Tez-0.6.0 as like in http://tez.apache.org/install.html 然后，我像在http://tez.apache.org/install.html中一样配置了Tez-0.6.0

Moved Tez lib package to HDFS location and updated my tez-site.xml 将Tez lib软件包移至HDFS位置并更新了我的tez-site.xml

 <property>
    <name>tez.lib.uris</name>
    <value>${fs.default.name}/apps/Tez/,${fs.default.name}/apps/Tez/lib/</value>
  </property>

After that I tried the sample test for tez 之后，我尝试了tez的样本测试

hadoop jar tez-examples-0.6.0.jar orderedwordcount <input> <output>

But I face following error while running this command 但是我在运行此命令时遇到以下错误

Running OrderedWordCount
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Hadoop/
share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBind
er.class]
SLF4J: Found binding in [jar:file:/C:/Tez/lib
/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/04/15 10:47:57 INFO client.TezClient: Tez Client Version: [ component=tez-api
, version=0.6.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apa
che.org/repos/asf/tez.git, buildTime=2015-04-15T01:13:02Z ]
15/04/15 10:48:00 INFO client.TezClient: Submitting DAG application with id: app
lication_1429073725727_0005
15/04/15 10:48:00 INFO Configuration.deprecation: fs.default.name is deprecated.
 Instead, use fs.defaultFS
15/04/15 10:48:00 INFO client.TezClientUtils: Using tez.lib.uris value from conf
iguration: hdfs://HA-Cluster/apps/Tez/,hdfs://HA-Cluster/apps/Tez/lib/
15/04/15 10:48:01 INFO client.TezClient: Stage directory /tmp/app/tez/sta
ging doesn't exist and is created
15/04/15 10:48:01 INFO client.TezClient: Tez system stage directory hdfs://HA-cluster
/tmp/app/tez/staging/.tez/application_1429073725727_0005 doesn't ex
ist and is created
15/04/15 10:48:02 INFO client.TezClient: Submitting DAG to YARN, applicationId=a
pplication_1429073725727_0005, dagName=OrderedWordCount
15/04/15 10:48:03 INFO impl.YarnClientImpl: Submitted application application_14
29073725727_0005
15/04/15 10:48:03 INFO client.TezClient: The url to track the Tez AM: http://syn
cserver34:8088/proxy/application_1429073725727_0005/
15/04/15 10:48:03 INFO client.DAGClientImpl: Waiting for DAG to start running
15/04/15 10:48:09 INFO client.DAGClientImpl: DAG completed. FinalState=FAILED
OrderedWordCount failed with diagnostics: [Application application_1429073725727
_0005 failed 2 times due to AM Container for appattempt_1429073725727_0005_00000
2 exited with  exitCode: -1073741515 due to: Exception from container-launch: Ex
itCodeException exitCode=-1073741515:
ExitCodeException exitCode=-1073741515:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
702)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
unchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
        at java.lang.Thread.run(Thread.java:744)

        1 file(s) moved.

Container exited with a non-zero exit code -1073741515
.Failing this attempt.. Failing the application.]

While Seeing at Resourcemanager log: 在查看Resourcemanager日志时：

15/04/15 12:56:15 ERROR scheduler.SchedulerApplicationAttempt: Error trying to a
ssign container token and NM token to an allocated container container_142908227
1173_0001_01_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: MasterNode
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUti
l.java:373)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(Bu
ilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTok
enSecretManager.createContainerToken(RMContainerTokenSecretManager.java:199)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppl
icationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttem
pt.java:425)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.F
iCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:248)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Capa
cityScheduler.allocate(CapacityScheduler.java:736)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:816)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:809)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.
doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMa
chineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMach
ineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine
.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl.handle(RMAppAttemptImpl.java:649)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl.handle(RMAppAttemptImpl.java:104)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$Applica
tionAttemptEventDispatcher.handle(ResourceManager.java:761)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$Applica
tionAttemptEventDispatcher.handle(ResourceManager.java:742)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher
.java:173)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.ja
va:106)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.UnknownHostException: MasterNode
        ... 19 more

Problem might be while connecting to nodemanager it unable to handshake with ResourceManager. 问题可能是连接到nodemanager时无法与ResourceManager握手。

If I try in single node hadoop cluster mean It working correctly. 如果我在单节点hadoop集群中尝试，则表示它正常工作。

Answer 1

Try to add property 尝试添加属性

yarn.nodemanager.delete.debug-delay-sec 1200 yarn.nodemanager.delete.debug-delay-sec 1200

One thing while running the "launchcontainer.cmd" located in hadoop \\tmp..\\appcache location.It arise an issue in accessing the Dll for running mapreduce on windows platform, ie MSVCR100.dll is missing to handle the Tez job.As bellow 运行位于hadoop \\ tmp .. \\ appcache位置的“ launchcontainer.cmd”时发生的一件事。在Windows平台上访问用于运行mapreduce的Dll时出现问题，即缺少MSVCR100.dll来处理Tez作业。

"The program can't start because MSCVR100.dll is missing from your computer. Try reinstalling the program to fix this issue" “由于您的计算机缺少MSCVR100.dll，因此无法启动该程序。请尝试重新安装该程序以解决此问题”

Provide Full privilege to hadoop-tmp directory and try to replaced/Moved msvcr100.dll(C:\\Windows\\System32) file in windows machine to run the mapreduce program for TEZ job. 为hadoop-tmp目录提供完全特权，并尝试替换/移动Windows计算机中的msvcr100.dll（C：\\ Windows \\ System32）文件，以运行TEZ作业的mapreduce程序。

Tez在Hadoop-2.5.2集群上崩溃

问题描述

1 个解决方案

解决方案1
0 2015-04-22 04:34:46

Tez在Hadoop-2.5.2集群上崩溃

问题描述

1 个解决方案

解决方案1 0 2015-04-22 04:34:46

解决方案1
0 2015-04-22 04:34:46