[英]numpy.load from io.BytesIO stream
I have numpy arrays saved in Azure Blob Storage, and I'm loading them to a stream like this:我在 Azure Blob 存储中保存了 numpy 数组,我正在将它们加载到这样的流中:
stream = io.BytesIO()
store.get_blob_to_stream(container, 'cat.npy', stream)
I know from stream.getvalue()
that the stream contains the metadata to reconstruct the array.我从stream.getvalue()
知道流包含重建数组的元数据。 This is the first 150 bytes:这是前 150 个字节:
b"\x93NUMPY\x01\x00v\x00{'descr': '|u1', 'fortran_order': False, 'shape': (720, 1280, 3), } \n\xc1\xb0\x94\xc2\xb1\x95\xc3\xb2\x96\xc4\xb3\x97\xc5\xb4\x98\xc6\xb5\x99\xc7\xb6\x9a\xc7"
Is it possible to load the bytes stream with numpy.load
or by some other simple method?是否可以使用numpy.load
或其他一些简单方法加载字节流?
I could instead save the array to disk and load it from disk, but I'd like to avoid that for several reasons...我可以改为将阵列保存到磁盘并从磁盘加载它,但出于多种原因我想避免这种情况......
EDIT: just to emphasize, the output would need to be a numpy array with the shape and dtype specified in the 128 first bytes of the stream.编辑:只是强调一下,输出需要是一个 numpy 数组,其形状和 dtype 在流的前 128 个字节中指定。
I tried to use several ways to realize your needs.我尝试使用几种方法来实现您的需求。
Here is my sample codes.这是我的示例代码。
from azure.storage.blob.baseblobservice import BaseBlobService
import numpy as np
account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<your container name>'
blob_name = '<your blob name>'
blob_service = BaseBlobService(
account_name=account_name,
account_key=account_key
)
Sample 1. To generate a blob url with sas token to get the content via requests
示例 1. 使用 sas 令牌生成 blob url 以通过requests
获取内容
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
import requests
sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
print(sas_token)
url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
print(url_with_sas)
r = requests.get(url_with_sas)
dat = np.frombuffer(r.content)
print('from requests', dat)
Sample 2. To download the content of blob into memory via BytesIO
示例 2. 通过BytesIO
将 blob 的内容下载到内存中
import io
stream = io.BytesIO()
blob_service.get_blob_to_stream(container_name, blob_name, stream)
dat = np.frombuffer(stream.getbuffer())
print('from BytesIO', dat)
Sample 3. Use numpy.fromfile
with DataSource
to open a blob url with sas token, it will actually download blob file into local filesystem.示例 3. 使用带有DataSource
的numpy.fromfile
打开带有 sas 令牌的 blob url,它实际上会将 blob 文件下载到本地文件系统中。
ds = np.DataSource()
# ds = np.DataSource(None) # use with temporary file
# ds = np.DataSource(path) # use with path like `data/`
f = ds.open(url_with_sas)
dat = np.fromfile(f)
print('from DataSource', dat)
I think Samples 1 & 2 are better for you.我认为示例 1 和 2 更适合您。
When it comes to np.savez the above solution normally want work.当谈到 np.savez 时,上述解决方案通常需要工作。
import io
import numpy as np
stream = io.BytesIO()
arr1 = np.random.rand(20,4)
arr2 = np.random.rand(20,4)
np.savez(stream, A=arr1, B=arr2)
block_blob_service.create_blob_from_bytes(container,
"my/path.npz",
stream.getvalue())
from numpy.lib.npyio import NpzFile
stream = io.BytesIO()
block_blob_service.get_blob_to_stream(container, "my/path.npz", stream)
ret = NpzFile(stream, own_fid=True, allow_pickle=True)
print(ret.files)
""" ['A', 'B'] """
print(ret['A'].shape)
""" (20, 4) """
This is a bit of a hacky way I came up with, which basically just gets the metadata from the first 128 bytes:这是我想出的一种有点老套的方法,它基本上只是从前 128 个字节中获取元数据:
def load_npy_from_stream(stream_):
"""Experimental, may not work!
:param stream_: io.BytesIO() object obtained by e.g. calling BlockBlobService().get_blob_to_stream() containing
the binary stream of a standard format .npy file.
:return: numpy.ndarray
"""
stream_.seek(0)
prefix_ = stream_.read(128) # first 128 bytes seem to be the metadata
dict_string = re.search('\{(.*?)\}', prefix_[1:].decode())[0]
metadata_dict = eval(dict_string)
array = np.frombuffer(stream_.read(), dtype=metadata_dict['descr']).reshape(metadata_dict['shape'])
return array
Could fail in numerous ways, but I'm posting it here if anyone wants to give it a shot.可能会以多种方式失败,但如果有人想试一试,我会在这里发布它。 I'll be running tests with this and will get back as I know more.我将对此进行测试,并会在我知道更多时回来。
A little late but in case anyone wants to do this using numpy.load, here's the code (Azure SDK v12.8.1):有点晚了,但如果有人想使用 numpy.load 执行此操作,请使用以下代码(Azure SDK v12.8.1):
from azure.storage.blob import BlobServiceClient
import io
import numpy as np
# define your connection parameters
connect_str = ''
container_name = ''
blob_name = ''
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_client = blob_service_client.get_blob_client(container=container_name,
blob=blob_name)
# Get StorageStreamDownloader
blob_stream = blob_client.download_blob()
stream = io.BytesIO()
blob_stream.download_to_stream(stream)
stream.seek(0)
# Load form io.BytesIO object
data = np.load(stream, allow_pickle=False)
print(data.shape)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.