简体   繁体   中英

Download Blob To Local Storage using Python

I'm trying to download a blob file & store it locally on my machine. The file format is HDF5 (a format I have limited/no experience of so far).

So far I've been successful in downloading something using the scripts below. The key issue is it doesn't seem to be the full file. When downloading the file directly from storage explorer it is circa 4,000kb. The HDF5 file I save is 2kb.

What am I doing wrong? Am I missing a readall() somewhere?

My first time working with blob storage & HDF5's, so coming a little stuck right now. A lot of the old questions seem to be using deprecated commands as the azure.storage.blob module has been updated.

from azure.storage.blob import BlobServiceClient
from io import StringIO, BytesIO
import h5py

# Initialise client
blob_service_client = BlobServiceClient.from_connection_string("my_conn_str")
# Initialise container
blob_container_client = blob_service_client.get_container_client("container_name")
# Get blob
blob_client = blob_container_client.get_blob_client("file_path")

# Download
download_stream = blob_client.download_blob()

# Create empty stream
stream = BytesIO()
# Read downloaded blob into stream
download_stream.readinto(stream)
# Create new empty hdf5 file
hf = h5py.File('data.hdf5', 'w')
# Write stream into empty HDF5
hf.create_dataset('dataset_1',stream)
# Close Blob (& save)
hf.close()

I tried to reproduce the scenario in my system facing with same issue with code you tried

So I tried the another solution read the hdf5 file as stream and write it inside another hdf5 file

Try with this solution.Taken some dummy data for testing purpose.

from azure.storage.blob import BlobServiceClient
from io import StringIO, BytesIO
import numpy as np
import h5py

# Initialise client
blob_service_client = BlobServiceClient.from_connection_string("Connection String")
# Initialise container
blob_container_client = blob_service_client.get_container_client("test//Container name")
# Get blob
blob_client = blob_container_client.get_blob_client("test.hdf5 //Blob name")

print("downloaded the blob ")
# Download
download_stream = blob_client.download_blob()
stream = BytesIO()
downloader = blob_client.download_blob()

# download the entire file in memory here
# file can be many giga bytes! Big problem
downloader.readinto(stream)

# works fine to open the stream and read data
f = h5py.File(stream, 'r')


//dummy data
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

with h5py.File(stream, "r") as f:
    # List all groups
    print("Keys: %s" % f.keys())
    a_group_key = list(f.keys())[0]

    # Get the data
    data = list(f[a_group_key])
    data_matrix=data
    print(data)

with h5py.File("file1.hdf5", "w") as data_file:
    data_file.create_dataset("group_name", data=data_matrix)

OUTPUT

在此处输入图像描述

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM