简体   繁体   English

从 AWS S3 存储桶读取缓慢

[英]Slow reading from AWS S3 bucket

I'm trying to read a file with pandas from an s3 bucket without downloading the file to the disk.我正在尝试从 s3 存储桶中读取带有 Pandas 的文件,而不将该文件下载到磁盘。 I've tried to use boto3 for that as我尝试使用 boto3 作为

import boto3

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucket_name', Key="key")
read_file = io.BytesIO(obj['Body'].read())
pd.read_csv(read_file)

And also I've tried s3fs as而且我也试过 s3fs 作为

import s3fs
import pandas as pd

fs = s3fs.S3FileSystem(anon=False)
with fs.open('bucket_name/path/to/file.csv', 'rb') as f:
    df = pd.read_csv(f)`

The issue is it takes too long to read the file.问题是读取文件需要很长时间。 It takes about 3 minutes to read 38MB file.读取 38MB 文件大约需要 3 分钟。 Is it supposed to be like that?它应该是这样的吗? If it is, then is there any faster way to do the same.如果是,那么有没有更快的方法来做同样的事情。 If it's not, any suggestions what might cause the issue?如果不是,有什么建议可能导致问题?

Thanks!谢谢!

Based on this answer to a similar issue, you might want to consider what region the bucket you're reading from is in, compared to where you're reading it from.基于对类似问题的回答,您可能需要考虑与读取数据的位置相比,您正在读取的存储桶所在的区域。 Might be a simple change (assuming you have control over the buckets location) which could improve the performance drastically.可能是一个简单的更改(假设您可以控制存储桶的位置),它可以显着提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从脚本访问S3存储桶,而无需重新配置AWS CLI - Accessing S3 bucket from script, without reconfiguring the AWS CLI 从 AWS S3 存储桶发送 Twilio 传真 - Sending Twilio Fax from AWS S3 bucket 使用AWS Lambda中的Boto3将一个帐户中的S3存储桶复制到另一个帐户中的S3存储桶 - Copy from S3 bucket in one account to S3 bucket in another account using Boto3 in AWS Lambda 在 AWS S3 存储桶中获取文件的标签 - Getting files' tag in AWS S3 bucket 从 AWS S3 中可用的 gzip 文件中读取内容 - Reading contents from gzip file which was available in AWS S3 从 AWS Lambda 上的 S3 读取文件时出现 IncompleteReadError - IncompleteReadError when reading file from S3 on AWS Lambda 使用 boto3 和 python 从 S3 存储桶目录中仅读取特定格式的文件 - Reading only specific format files from a S3 bucket dir using boto3 and python 使用 boto3 lib 和 AWS Lambda 从位于 S3 存储桶中的压缩文件中获取数据流 - Getting a data stream from a zipped file sitting in a S3 bucket using boto3 lib and AWS Lambda 连接从多个 aws s3 存储桶读取的数据帧会生成 NoneType 错误 - Concatenating dataframes read from multiple aws s3 bucket generate NoneType error 使用 Python3 从 AWS S3 存储桶上传和下载特定版本的文件 - Upload and download file wrt specific version from AWS S3 bucket using Python3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM