I have several JSON files (zipped with .gz format) in some HDFS directories in a tree like:
/master/dir1/file1.gz
/dir2/file2.gz
/dir3/file3.gz
...
I need to read those files from the path /master/ and join them into a RDD with Spark in Java. How could I do it?
[Edit] If
JavaRDD<String> textFile = sc.textFile("hdfs://master/dir*/file*");
doesn't work, another way is to list the files and union
fileSystem.listStatus(new Path("hdfs://master/dir*"))
.filter(d -> d.isDirectory())
.map(p -> sc.textFile(p.getPath()))
.reduce((a, b) -> a.unionAll(b))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.