[英]Load npy file from S3 in python
is there anyway to load/read an external file(ie, AWS S3) in numpy?.无论如何要在 numpy 中加载/读取外部文件(即 AWS S3)? I have several npy files stored in S3.
我在 S3 中存储了几个 npy 文件。 I have tried to access them through a S3 presigned url but it seems neither numpy.load method or np.genfromtxt are able to read them.
我试图通过 S3 预签名 url 访问它们,但似乎 numpy.load 方法或 np.genfromtxt 都无法读取它们。
I wouldn't want to save files on local file system and then load them on numpy.我不想将文件保存在本地文件系统上,然后将它们加载到 numpy 上。
Any idea?有什么想法吗?
I've compared s3fs and io.BytesIO for loading a 28G npz file from s3.我比较了 s3fs 和 io.BytesIO 从 s3 加载 28G npz 文件。 s3fs takes 30 min while io takes 12 min.
s3fs 需要 30 分钟,而 io 需要 12 分钟。
obj = s3_session.resource("s3").Object(bucket, key)
with io.BytesIO(obj.get()["Body"].read()) as f:
f.seek(0) # rewind the file
X, y = np.load(f).values()
s3fs = S3FileSystem()
with s3fs.open(f"s3://{bucket}/{key}") as s3file:
X, y = np.load(s3file).values()
I had success using boto and StringIO.我使用boto和 StringIO 取得了成功。 Connect to S3 using boto and get your bucket.
使用 boto 连接到 S3 并获取您的存储桶。 Then read the file with following code into numpy:
然后将带有以下代码的文件读入numpy:
import numpy as np
from StringIO import StringIO
key=bucket.get_key('YOUR_KEY')
data_string=StringIO(key.get_contents_as_string())
data = np.load(data_string)
I am not sure it's the most efficient way, but it doesn't require a public URL.我不确定这是最有效的方法,但它不需要公共 URL。
Cheers, Michael干杯,迈克尔
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.