[英]Hdfs file access in spark
I am developing an application , where I read a file from hadoop, process and store the data back to hadoop. 我正在开发一个应用程序,在其中我从hadoop读取文件,处理并将数据存储回hadoop。 I am confused what should be the proper hdfs file path format.
我很困惑应该是正确的hdfs文件路径格式。 When reading a hdfs file from spark shell like
从Spark Shell读取HDFS文件时
val file=sc.textFile("hdfs:///datastore/events.txt")
it works fine and I am able to read it. 它工作正常,我能够阅读。
But when I sumbit the jar to yarn which contains same set of code it is giving the error saying 但是当我将罐子加到包含相同代码集的纱线上时,它给出了错误提示
org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/datastore/events.txt
When I add name node ip as hdfs://namenodeserver/datastore/events.txt
everything works. 当我将名称节点ip添加为
hdfs://namenodeserver/datastore/events.txt
一切正常。
I am bit confused about the behaviour and need an guidance. 我对行为有点困惑,需要指导。
Note: I am using aws emr set up and all the configurations are default. 注意:我正在使用aws emr设置,并且所有配置都是默认配置。
if you want to use sc.textFile("hdfs://...") you need to give the full path(absolute path), in your example that would be "nn1home:8020/.." 如果要使用sc.textFile(“ hdfs:// ...”),则需要提供完整路径(绝对路径),在您的示例中为“ nn1home:8020 /。”。
If you want to make it simple, then just use sc.textFile("hdfs:/input/war-and-peace.txt") 如果要使其简单,则只需使用sc.textFile(“ hdfs:/input/war-and-peace.txt”)
That's only one / 那只是一个/
I think it will work. 我认为它将起作用。
Problem solved. 问题解决了。 As I debugged further
fs.defaultFS
property was not used from core-site.xml
when I just pass path as hdfs:///path/to/file
. 当我进一步调试时,仅将path作为
hdfs:///path/to/file
传递时,未从core-site.xml
使用fs.defaultFS
属性。 But all the hadoop config properties are loaded (as I logged the sparkContext.hadoopConfiguration
object. 但是所有的hadoop配置属性都已加载(当我记录了
sparkContext.hadoopConfiguration
对象时。
As a work around I manually read the property as sparkContext.hadoopConfiguration().get("fs.defaultFS)
and appended this in the path. 作为解决方法,我手动将属性读取为
sparkContext.hadoopConfiguration().get("fs.defaultFS)
并将其附加在路径中。
I don't know is it a correct way of doing it. 我不知道这是正确的做法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.