簡體   English   中英

flink 1.12.1 示例應用程序在單節點紗線集群上失敗

[英]flink 1.12.1 example application failing on a single node yarn cluster

我正在嘗試 flink 示例,如flink docs in a single node yarn cluster中所述。

如本討論中所述, HADOOP_CONF_DIR在執行 yarn 命令之前也設置如下。

export HADOOP_CONF_DIR=/etc/hadoop/conf

在執行以下命令時

ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application  ./examples/streaming/TopSpeedWindowing.jar

它因以下錯誤而失敗

 The program finished with the following exception:

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
    at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
    at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
    at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

我已經設置了日志級別 DEBUG 並且確實看到flink-dist_2.12-1.12.1.jar被復制到/home/ubuntu/.flink/application_1614159836384_0045

2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader            [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1

我已將整個 DEBUG 日志放在這里

Nodemanger 日志有如下警告

2021-02-24 16:36:34,219 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1614159836384_0047
2021-02-24 16:36:34,220 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1614159836384_0047_01_000001
2021-02-24 16:36:34,222 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1614159836384_0047_01_000001.tokens
2021-02-24 16:36:34,222 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user ubuntu
2021-02-24 16:36:34,224 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1614159836384_0047_01_000001.tokens to /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/application_1614159836384_0047/container_1614159836384_0047_01_000001.tokens
2021-02-24 16:36:34,224 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/application_1614159836384_0047 = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/application_1614159836384_0047
2021-02-24 16:36:34,247 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-02-24 16:36:34,268 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: { file:/home/ubuntu/.flink/application_1614159836384_0047/flink-dist_2.12-1.12.1.jar, 1614184593000, FILE, null } failed: File file:/home/ubuntu/.flink/application_1614159836384_0047/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0047/flink-dist_2.12-1.12.1.jar does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
        at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

整個節點管理器日志都在這里

有人可以讓我知道出了什么問題嗎? flink 不支持單節點 yarn 集群進行開發嗎?

  • Flink 版本 1.12.1

我的設置中存在配置問題。 在我的設置hadoop-yarn-nodemenager與紗線用戶一起運行。

ubuntu@vrni-platform:/tmp/flink$ ps -ef | grep nodemanager
yarn      4953     1  2 05:53 ?        00:11:26 /usr/lib/jvm/java-8-openjdk/bin/java -Dproc_nodemanager -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/heap-dumps/yarn -XX:+ExitOnOutOfMemoryError -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dyarn.home.dir=/usr/lib/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Xmx512m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-yarn-nodemanager-vrni-platform.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager

我正在以ubuntu用戶身份執行./bin/flink命令,並且在我的設置中, yarn用戶無權寫入ubuntu's主文件夾。

ubuntu@vrni-platform:/tmp/flink$ echo ~ubuntu
/home/ubuntu
ubuntu@vrni-platform:/tmp/flink$ echo ~yarn
/var/lib/hadoop-yarn

即使作業是在紗線中提交的,flink 似乎也需要寫入用戶主目錄的權限才能創建.flink文件夾。 如果我在設置中使用yarn用戶運行 flink,對我來說效果很好。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM