在python中从S3加载npy文件

Question

is there anyway to load/read an external file(ie, AWS S3) in numpy?.无论如何要在 numpy 中加载/读取外部文件（即 AWS S3）？ I have several npy files stored in S3.我在 S3 中存储了几个 npy 文件。 I have tried to access them through a S3 presigned url but it seems neither numpy.load method or np.genfromtxt are able to read them.我试图通过 S3 预签名 url 访问它们，但似乎 numpy.load 方法或 np.genfromtxt 都无法读取它们。

I wouldn't want to save files on local file system and then load them on numpy.我不想将文件保存在本地文件系统上，然后将它们加载到 numpy 上。

Any idea?有什么想法吗？

Answer 1

Using s3fs使用s3fs

import numpy as np
from s3fs.core import S3FileSystem
s3 = S3FileSystem()

key = 'your_file.npy'
bucket = 'your_bucket'

df = np.load(s3.open('{}/{}'.format(bucket, key)))

You might have to set the allow_pickle=True depending on your file to be read.您可能必须根据要读取的文件设置allow_pickle=True 。

Answer 2

I've compared s3fs and io.BytesIO for loading a 28G npz file from s3.我比较了 s3fs 和 io.BytesIO 从 s3 加载 28G npz 文件。 s3fs takes 30 min while io takes 12 min. s3fs 需要 30 分钟，而 io 需要 12 分钟。

obj = s3_session.resource("s3").Object(bucket, key)
with io.BytesIO(obj.get()["Body"].read()) as f:
    f.seek(0)  # rewind the file
    X, y = np.load(f).values()

s3fs = S3FileSystem()
with s3fs.open(f"s3://{bucket}/{key}") as s3file:
     X, y = np.load(s3file).values()

Answer 3

I had success using boto and StringIO.我使用boto和 StringIO 取得了成功。 Connect to S3 using boto and get your bucket.使用 boto 连接到 S3 并获取您的存储桶。 Then read the file with following code into numpy:然后将带有以下代码的文件读入numpy：

  import numpy as np
  from StringIO import StringIO
  key=bucket.get_key('YOUR_KEY')
  data_string=StringIO(key.get_contents_as_string())
  data = np.load(data_string)

I am not sure it's the most efficient way, but it doesn't require a public URL.我不确定这是最有效的方法，但它不需要公共 URL。

Cheers, Michael干杯，迈克尔

在python中从S3加载npy文件

问题描述

3 个解决方案

解决方案1
10 已采纳 2019-08-06 19:57:32

解决方案2
2 2020-10-28 20:59:42

解决方案3
1 2016-12-30 10:34:23

在python中从S3加载npy文件

问题描述

3 个解决方案

解决方案1 10 已采纳 2019-08-06 19:57:32

解决方案2 2 2020-10-28 20:59:42

解决方案3 1 2016-12-30 10:34:23

解决方案1
10 已采纳 2019-08-06 19:57:32

解决方案2
2 2020-10-28 20:59:42

解决方案3
1 2016-12-30 10:34:23