[英]Read csv from Amazon s3 using python2.7
I can easily get the bucket name from s3 but when I read the csv file from s3, it gives error every time. 我可以轻松地从s3获取存储桶名称但是当我从s3读取csv文件时,它每次都会出错。
import boto3
import pandas as pd
s3 = boto3.client('s3',
aws_access_key_id='yyyyyyyy',
aws_secret_access_key='xxxxxxxxxxx')
# Call S3 to list current buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
print bucket['Name']
output
s3-bucket-data
. 。
import pandas as pd
import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('s3-bucket-data')
fileName = "data.csv"
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
getting error- 得到错误 -
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
How I can read the csv from s3? 我如何从s3读取csv?
you can use s3fs
package 你可以使用
s3fs
包
s3fs also supports aws profiles in credential files. s3fs还支持凭证文件中的aws配置文件。
Here is an example (you don't have to chunk it, but i just had this example handy), 这是一个例子(你不必将它分块,但我只是把这个例子放在手边),
import os
import pandas as pd
import s3fs
import gzip
chunksize = 999999
usecols = ["Col1", "Col2"]
filename = 'some_csv_file.csv.gz'
s3_bucket_name = 'some_bucket_name'
AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
s3f = s3fs.S3FileSystem(
anon=False,
key=AWS_KEY,
secret=AWS_SECRET)
# or if you have a profile defined in credentials file:
#aws_shared_credentials_file = 'path/to/aws/credentials/file/'
#os.environ['AWS_SHARED_CREDENTIALS_FILE'] = aws_shared_credentials_file
#s3f = s3fs.S3FileSystem(
# anon=False,
# profile_name=s3_profile)
filepath = os.path.join(s3_bucket_name, filename)
with s3f.open(filepath, 'rb') as f:
gz = gzip.GzipFile(fileobj=f) # Decompress data with gzip
chunks = pd.read_csv(gz,
usecols=usecols,
chunksize=chunksize,
iterator=True,
)
df = pd.concat([c for c in chunks], axis=1)
boto
is onething I love when it comes to handling data on S3 with python.. 当使用python处理S3上的数据时,
boto
是我喜欢的。
install boto
using pip install boto
安装
boto
使用pip install boto
import boto
from boto.s3.key import Key
keyId ="your_aws_key_id"
sKeyId="your_aws_secret_key_id"
srcFileName="abc.txt" # filename on S3
destFileName="s3_abc.txt" # output file name
bucketName="mybucket001" # S3 bucket name
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
#Get the Key object of the given key, in the bucket
k = Key(bucket,srcFileName)
#Get the contents of the key into a file
k.get_contents_to_filename(destFileName)
I experienced this issue with a few AWS Regions. 我在几个AWS区域遇到过这个问题。 I created a bucket in "us-east-1" and the following code worked fine:
我在“us-east-1”中创建了一个存储桶,以下代码运行正常:
import boto
from boto.s3.key import Key
import StringIO
import pandas as pd
keyId ="xxxxxxxxxxxxxxxxxx"
sKeyId="yyyyyyyyyyyyyyyyyy"
srcFileName="zzzzz.csv"
bucketName="elasticbeanstalk-us-east-1-aaaaaaaaaaaa"
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
k = Key(bucket,srcFileName)
content = k.get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
Try creating a new bucket in us-east-1 and see if it works. 尝试在us-east-1中创建一个新存储桶,看看它是否有效。
Try the following: 请尝试以下方法:
import boto3
from boto3 import session
import pandas as pd
import io
session = boto3.session.Session(region_name='XXXX')
s3client = session.client('s3', config =
boto3.session.Config(signature_version='XXXX'))
response = s3client.get_object(Bucket='myBucket', Key='myKey')
dataset = pd.read_csv(io.BytesIO(response['Body'].read()), encoding='utf8')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.