![](/img/trans.png)
[英]Azure python storage block blob storage is eating all the memory up
[英]How to read a big Azure blob storage file block by block
我想读取一个巨大的 Azure blob 存储文件和 stream 其内容到 Event-Hub。 我找到了这个例子,
from azure.storage.blob import BlockBlobService
bb = BlockBlobService(account_name='', account_key='')
container_name = ""
blob_name_to_download = "test.txt"
file_path ="/home/Adam/Downloaded_test.txt"
bb.get_blob_to_path(container_name, blob_name_to_download, file_path, open_mode='wb',
snapshot=None, start_range=None, end_range=None, validate_content=False,
progress_callback=None, max_connections=2, lease_id=None,
if_modified_since=None, if_unmodified_since=None,
if_match=None, if_none_match=None, timeout=None)
但是通过这种方式,你不能在循环中获得块,这是我想做的。 那么,如何针对我的案例修改此代码?
如果您注意到, get_blob_to_path
方法中有两个参数 - start_range
和end_range
。 这两个参数将允许您以块的形式读取 blob 的数据。
您需要做的是首先获取 blob 的属性以找到其长度,然后重复调用get_blob_xxx
方法以分块获取数据。 我使用get_blob_to_text
方法,但您可以here
查看其他方法。
这是我想出的伪代码。 HTH。
bb = BlockBlobService(account_name='', account_key='')
container_name = ""
blob_name_to_download = "test.txt"
file_path ="/home/Adam/Downloaded_test.txt"
#First get blob properties. We would want to find out blob's content length
blob = bb.get_blob_properties()
#extract content length from blob's properties
blob_size = blob.properties.content_length
#now let's say we want to fetch 1MB chunk at a time so we loop and fetch 1MB content at a time.
start = 0
end = blob_size
chunk_size = 1 * 1024 * 1024 #1MB
do
start_range = start
end_range = start + chunk_size - 1
blob_chunk_content = bb.get_blob_to_text(container_name, blob_name,
encoding='utf-8', snapshot=None, start_range=start_range, end_range=end_range,
validate_content=False, progress_callback=None, max_connections=2,
lease_id=None, if_modified_since=None, if_unmodified_since=None,
if_match=None, if_none_match=None, timeout=None)
#blob_chunk_content will have 1 MB data. Do whatever you like with it.
start = end_range + 1
while (start < end)
这是 Gaurav 伪代码的 Python 版本。 请注意,我必须使用pip install azure-storage-blob==2.1.0
azure.storage.blob
package。
from azure.storage.blob import BlockBlobService
bb = BlockBlobService(account_name='<storage_account_name>', account_key='<sas_key>')
container_name = "<container_name>"
blob_name = "<dir>/<file>"
#First get blob properties. We would want to find out blob's content length
blob = bb.get_blob_properties(container_name=container_name, blob_name=blob_name)
#extract content length from blob's properties
blob_size = blob.properties.content_length
#now let's say we want to fetch 1MB chunk at a time,
# so we loop and fetch 1MB content at a time.
start = 0
end = blob_size
chunk_size = 1 * 1024 * 1024 # 1MB
while start < end:
start_range = start
end_range = start + chunk_size - 1
blob_chunk_content = bb.get_blob_to_text(container_name, blob_name,
encoding='utf-8', snapshot=None, start_range=start_range, end_range=end_range,
validate_content=False, progress_callback=None, max_connections=2,
lease_id=None, if_modified_since=None, if_unmodified_since=None,
if_match=None, if_none_match=None, timeout=None)
print(blob_chunk_content.content)
start = end_range + 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.