简体   繁体   English

hdfs-获取文件夹错误

[英]hdfs -get folder error

I met an error when run hdfs dfs -get command like this: 在运行hdfs dfs -get命令时遇到如下错误:

[work@myserver ~]$ hdfs dfs -get hdfs://hadoopserver:8020/path/DataLoad/
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.fs.FsShell.displayError(FsShell.java:304)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:289)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

The "hadoopserver" is the master of a hadoop cluster which is not in the same network with myserver. “ hadoopserver”是hadoop群集的主服务器,该群集与myserver不在同一网络中。 I connect it by config public network IP in "/etc/hosts". 我通过“ / etc / hosts”中的配置公用网络IP连接它。

What's more, this error only happen when getting folder and from out network. 更重要的是,此错误仅在从网络获取文件夹时发生。 Look 2 examples below: The first one is I get the folder from another server which is in the same network with hadoopserver. 看下面的2个示例:第一个是从与hadoopserver在同一网络中的另一台服务器获取文件夹。 And the second example is I get a file instead of a folder from the same server(myserver). 第二个示例是我从同一服务器(myserver)获得文件而不是文件夹。 Both them succeed. 他们俩都成功了。

[work@hadoopserver_2 lewis]$ hdfs dfs -get hdfs://hadoopserver:8020/path/DataLoad ./
[work@hadoopserver_2 lewis]$ du -sh DataLoad/
 1.2G   DataLoad/

[work@myserver ~]$ hdfs dfs -get hdfs://hadoopserver:8020/path/DataLoad/part-r-00375-724f4a2e-ed40-4100-8e81-6657d9dacc01.gz.parquet ./
[work@myserver ~]$ ls
part-r-00375-724f4a2e-ed40-4100-8e81-6657d9dacc01.gz.parquet

To add one last point, when I get folder from outer network and get the error, the folder will always be created and sometimes get incomplete files. 最后一点,当我从外部网络获取文件夹并收到错误消息时,将始终创建该文件夹,有时还会得到不完整的文件。

[work@myserver DataLoad]$ ls
part-r-00000-724f4a2e-ed40-4100-8e81-6657d9dacc01.gz.parquet  _SUCCESS
[work@myserver DataLoad]$ du -sh
10M .

(There should be many files in DataLoad and larger than 10M. And in most times, there is only a empty file named "_SUCCESS".) (DataLoad中应该有许多文件,并且文件大小应大于10M。并且在大多数情况下,只有一个名为“ _SUCCESS”的空文件。)

my workmate has solved this problem. 我的同事已经解决了这个问题。 We use hdfs --loglevel DEBUG to show the detail error. 我们使用hdfs --loglevel DEBUG来显示详细信息错误。 The root cause is that I just config the master server ip in "/etc/hosts". 根本原因是我只在“ / etc / hosts”中配置了主服务器IP。 When myserver tried to get data from a hadoop cluster from outer network. myserver尝试从外部网络从hadoop集群获取数据时。 The master server(default 8020 port) tells me which datanode(default 50070 port) I should to connect to. 主服务器(默认8020端口)告诉我应该连接到哪个datanode(默认50070端口)。 So I can't get the whole folder because the file must in different datanodes. 所以我无法获取整个文件夹,因为文件必须位于不同的datanodes中。 I successed to get the file maybe it just on the master node(which is the only server config in my hosts). 我成功地获取了文件,也许只是在主节点上(这是主机中唯一的服务器配置)。

So after I config all node's public IP in hosts. 因此,在主机中配置所有节点的公共IP之后。 I can get the folder succeessfully. 我可以成功获取文件夹。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM