![](/img/trans.png)
[英]AWS Lambda: read csv file dimensions from an s3 bucket with Python without using Pandas or CSV package
[英]Read a csv file from aws s3 using boto and pandas
我正在嘗試從S3
存儲桶中讀取csv
對象,並且能夠使用以下代碼成功讀取數據。
srcFileName="gossips.csv"
def on_session_started():
print("Starting new session.")
conn = S3Connection()
my_bucket = conn.get_bucket("randomdatagossip", validate=False)
print("Bucket Identified")
print(my_bucket)
key = Key(my_bucket,srcFileName)
key.open()
print(key.read())
conn.close()
on_session_started()
但是,如果我嘗試使用 Pandas 作為數據框讀取同一個對象,則會出現錯誤。 最常見的是S3ResponseError: 403 Forbidden
def on_session_started2():
print("Starting Second new session.")
conn = S3Connection()
my_bucket = conn.get_bucket("randomdatagossip", validate=False)
# url = "https://s3.amazonaws.com/randomdatagossip/gossips.csv"
# urllib2.urlopen(url)
for line in smart_open.smart_open('s3://my_bucket/gossips.csv'):
print line
# data = pd.read_csv(url)
# print(data)
on_session_started2()
我究竟做錯了什么? 我使用的是 python 2.7,不能使用 Python 3。
這是我為成功從 S3 上的csv
讀取df
所做的工作。
import pandas as pd
import boto3
bucket = "yourbucket"
file_name = "your_file.csv"
s3 = boto3.client('s3')
# 's3' is a key word. create connection to S3 using default config and all buckets within S3
obj = s3.get_object(Bucket= bucket, Key= file_name)
# get object and file (key) from bucket
initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word
這對我有用。
import pandas as pd
import boto3
import io
s3_file_key = 'data/test.csv'
bucket = 'data-bucket'
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=s3_file_key)
initial_df = pd.read_csv(io.BytesIO(obj['Body'].read()))
也許你可以嘗試使用 pandas read_sql 和 pyathena:
from pyathena import connect
import pandas as pd
conn = connect(s3_staging_dir='s3://bucket/folder',region_name='region')
df = pd.read_sql('select * from database.table', conn)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.