使用YARN在集群模式下运行spark时出现java.io.FileNotFoundException

Question

I have a spark application that runs as expected on one node. 我有一个可以按预期在一个节点上运行的spark应用程序。

I am now using yarn to run this across multiple nodes. 我现在正在使用yarn在多个节点上运行它。 However, this is failing with a file not found exception. 但是，此操作失败，找不到文件异常。 I first changed this file path from relative to absolute path but the error persisted. 我首先将此文件路径从相对路径更改为绝对路径，但错误仍然存在。 I then read here that it may be necessary to prefix the path with file:// in case the default is for HDFS. 然后我在这里读到，如果默认是HDFS，则可能需要在路径前加上file://前缀。 This file type in question is json . 该文件类型为json 。

Despite using the absolute path and prefixing with file , this error persists: 尽管使用了绝对路径并以file前缀，但此错误仍然存在：

16/11/10 10:19:56 INFO yarn.Client: client token: N/A diagnostics: User class threw exception: java.io.FileNotFoundException: file://absolute/dir/file.json (No such file or directory)

Why does this work correctly with one node but not in cluster mode with yarn? 为什么这在一个节点上不能正常工作，而在纱线的群集模式下却不能正常工作？

Answer 1

You're missing a slash / . 您缺少斜线/ 。 Try: 尝试：

file:///absolute/dir/file.json

The file:// prefix here specifies the NFS file system, and you need to specify the absolute path from there beginning with a forward slash, requiring three forward slashes in total. 此处的file://前缀指定了NFS文件系统，您需要从此处指定绝对路径（从正斜杠开始），总共需要三个正斜杠。

使用YARN在集群模式下运行spark时出现java.io.FileNotFoundException

问题描述

1 个解决方案

解决方案1
0 2016-11-10 13:31:00

使用YARN在集群模式下运行spark时出现java.io.FileNotFoundException

问题描述

1 个解决方案

解决方案1 0 2016-11-10 13:31:00

解决方案1
0 2016-11-10 13:31:00