[英]ListFiles from HDFS Cluster
I am an amateur with hadoop and stuffs. 我是hadoop和东西的业余爱好者。 Now, I am trying to access the hadoop cluster (HDFS) and retrieve the list of files from client eclipse.
现在,我正在尝试访问hadoop集群(HDFS)并从客户端eclipse中检索文件列表。 I can do the following operations after setting up the required configurations on hadoop java client.
在hadoop java客户端上设置所需的配置后,我可以执行以下操作。
I can perform copyFromLocalFile , copyToLocalFile operations accessing HDFS from client. 我可以执行copyFromLocalFile , copyToLocalFile操作从客户端访问HDFS。 Here's what I am facing.
这就是我所面对的。 When i give listFiles() method I am getting
当我给出listFiles()方法时,我得到了
org.apache.hadoop.fs.LocatedFileStatus@d0085360
org.apache.hadoop.fs.LocatedFileStatus@b7aa29bf
MainMethod MainMethod
Properties props = new Properties();
props.setProperty("fs.defaultFS", "hdfs://<IPOFCLUSTER>:8020");
props.setProperty("mapreduce.jobtracker.address", "<IPOFCLUSTER>:8032");
props.setProperty("yarn.resourcemanager.address", "<IPOFCLUSTER>:8032");
props.setProperty("mapreduce.framework.name", "yarn");
FileSystem fs = FileSystem.get(toConfiguration(props)); // Setting up the required configurations
Path p4 = new Path("/user/myusername/inputjson1/");
RemoteIterator<LocatedFileStatus> ritr = fs.listFiles(p4, true);
while(ritr.hasNext())
{
System.out.println(ritr.next().toString());
}
I have also tried FileContext and ended up only getting the filestatus object string or something. 我也尝试过FileContext,最后只得到filestatus对象字符串或其他东西。 Is there a possibility to take the filenames when i iterate to the remote hdfs directory, there is a method called getPath(), Is that the only way we can retrieve the full path of the filenames using the hadoop API or there are any other method so that i can retrieve only name of the files in a specified directory path, Please help me through this, Thanks.
当我迭代到远程hdfs目录时是否有可能获取文件名,有一个名为getPath()的方法,这是我们使用hadoop API检索文件名的完整路径的唯一方法,还是有任何其他方法这样我只能检索指定目录路径中文件的名称,请帮我解决这个问题,谢谢。
You can indeed use getPath()
this will return you a Path
object which let you query the name of the file. 你确实可以使用
getPath()
这将返回一个Path
对象,它允许你查询文件的名称。
Path p = ritr.next().getPath();
// returns the filename or directory name if directory
String name = p.getName();
The FileStatus
object you get can tell you if this is a file or directory. 您获得的
FileStatus
对象可以告诉您这是文件还是目录。
Here is more API documentation: 这是更多的API文档:
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/Path.html http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/Path.html
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/FileStatus.html http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/FileStatus.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.