[英]Issue while connecting Amazon s3 using pySpark
I am using Spark 1.6 version local mode. 我正在使用Spark 1.6版本本地模式。 following is my code : 以下是我的代码:
First Attempt: 第一次尝试:
airline = sc.textFile("s3n://mortar-example-data/airline-data")
airline.take(2)
Second Attempt: 第二次尝试:
airline = sc.textFile("s3n://myid:mykey@mortar-example-data/airline-data")
airline.take(2)
the above code is throwing me following error: 上面的代码使我产生以下错误:
Py4JJavaError: An error occurred while calling o17.partitions.
: java.io.IOException: No FileSystem for scheme: s3n
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
Not sure what is missing here to connect to S3. 不知道这里缺少什么来连接到S3。 It will be great if someone could point me out 如果有人可以指出我,那将是很棒的
@John @约翰
Following is my solution 以下是我的解决方案
bucket = "your bucket"
# Prod App Key
prefix = "Your path to the file"
filename = "s3n://{}/{}".format(bucket, prefix)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "YourAccessKey")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "YourSecret key")
rdd = sc.hadoopFile(filename,
'org.apache.hadoop.mapred.TextInputFormat',
'org.apache.hadoop.io.Text',
'org.apache.hadoop.io.LongWritable',
)
rdd.count()
The above code worked for me... Good luck. 上面的代码对我有用...祝您好运。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.