I have a s3 file which I am trying to access through Python code. I am submitting my code in an EC2 instance via spark submit. To do the submission I use the following code post starting the master and slave.
./spark-submit --py-files /home/usr/spark-1.5.0/sbin/test_1.py
I get the following error: urllib2.HTTPError: HTTP Error 403: Forbidden
In the test_1.py, I calling the S3 file using the following:
import pandas as pd
import numpy as np
import boto
from boto.s3.connection import S3Connection
AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('BKT')
for file_key in bucket.list():
print file_key.name
df = pd.read_csv('https://BKT.s3.amazonaws.com/test_1.csv')
The above code works well in my local machine. However, it is not working in the EC2 instance.
Please let me know if anyone has a solution.
You cannot access the file using the link because the file is private by default in S3. You can change the rights or you can try this:
import pandas as pd
import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('BKT')
fileName = "test_1.csv"
# Saving the file locally and read it.
with open(fileName, 'w+') as writer:
bucket.get_key(fileName).get_file(writer)
with open(fileName, 'r') as reader:
reader = pd.read_csv(reader)
# Without saving the file locally.
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.