为什么检查hadoop中是否存在文件会导致NullPointerException？

Question

I'm trying to create or open a file to store some output in HDFS, but I'm getting a NullPointerException when I call the exists method in the second to last line of the code snippet below: 我正在尝试创建或打开一个文件来存储HDFS中的一些输出，但是当我在下面的代码片段的倒数第二行调用exists方法时，我得到一个NullPointerException：

DistributedFileSystem dfs = new DistributedFileSystem();
Path path = new Path("/user/hadoop-user/bar.txt");
if (!dfs.exists(path)) dfs.createNewFile(path);
FSDataOutputStream dos = dfs.create(path);

Here is the stack trace: 这是堆栈跟踪：

java.lang.NullPointerException
        at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
        at ClickViewSessions$ClickViewSessionsMapper.map(ClickViewSessions.java:80)
        at ClickViewSessions$ClickViewSessionsMapper.map(ClickViewSessions.java:65)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

What could the problem be? 问题是什么？

Answer 1

I think the preferred way of doing this is: 我认为这样做的首选方式是：

Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://mynamenodehost:9000");
FileSystem fs = FileSystem.get(conf);
Path path = ...

That way you don't tie your code to a particular implementation of FileSystem; 这样您就不会将代码绑定到FileSystem的特定实现; plus you don't have to worry about how each implementation of FileSystem is initialized. 另外，您不必担心FileSystem的每个实现是如何初始化的。

Answer 2

The default constructor DistributedFileSystem() does not perform initialization; 默认构造函数DistributedFileSystem（）不执行初始化; you need to call dfs.initialize() explicitly. 你需要显式调用dfs.initialize（）。

The reason you are getting a null pointer exception is that the DistributedFileSystem internally uses an instance of DFSClient. 获得空指针异常的原因是DistributedFileSystem内部使用DFSClient的实例。 Since you did not call initialize(), the instance of DFSClient is null. 由于您没有调用initialize（），因此DFSClient的实例为null。 getFileStatus() calls dfsClient.getFileInfo(getPathName(f) - which causes NullPointerException, since dfsClient is null. getFileStatus（）调用dfsClient.getFileInfo（getPathName（f） - 导致NullPointerException，因为dfsClient为null。

See https://trac.declarativity.net/browser/src/hdfs/org/apache/hadoop/dfs/DistributedFileSystem.java?rev=3593 请参阅https://trac.declarativity.net/browser/src/hdfs/org/apache/hadoop/dfs/DistributedFileSystem.java?rev=3593

Answer 3

This shoud work 这个大喊大叫

DistributedFileSystem dfs = new DistributedFileSystem();
dfs.initialize(new URI("URI to HDFS"), new Configuration());
Path path = new Path("/user/hadoop-user/bar.txt");
if (!dfs.exists(path)) dfs.createNewFile(path);
FSDataOutputStream dos = dfs.create(path);

为什么检查hadoop中是否存在文件会导致NullPointerException？

问题描述

3 个解决方案

解决方案1
10 已采纳 2011-01-18 21:06:17

解决方案2
8 2011-01-18 19:31:40

解决方案3
1 2013-10-22 08:06:23

为什么检查hadoop中是否存在文件会导致NullPointerException？

问题描述

3 个解决方案

解决方案1 10 已采纳 2011-01-18 21:06:17

解决方案2 8 2011-01-18 19:31:40

解决方案3 1 2013-10-22 08:06:23

解决方案1
10 已采纳 2011-01-18 21:06:17

解决方案2
8 2011-01-18 19:31:40

解决方案3
1 2013-10-22 08:06:23