簡體   English   中英

從 Azure blob Storage 讀取 csv 並存儲在 DataFrame

[英]Read csv from Azure blob Storage and store in a DataFrame

我正在嘗試使用 python 從 blob 存儲中讀取多個 CSV 文件。

我正在使用的代碼是:

blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs(folder_root)
for blob in blobs_list:
    blob_client = blob_service_client.get_blob_client(container=container, blob="blob.name")
    stream = blob_client.download_blob().content_as_text()

我不確定存儲在 pandas dataframe 中讀取的 CSV 文件的正確方法是什么。

我嘗試使用:

df = df.append(pd.read_csv(StringIO(stream)))

但這向我顯示了一個錯誤。

知道我該怎么做嗎?

您可以從 blob 存儲下載文件,然后從下載的文件中將數據讀入 pandas DataFrame。

from azure.storage.blob import BlockBlobService
import pandas as pd
import tables

STORAGEACCOUNTNAME= <storage_account_name>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

#download from blob
t1=time.time()
blob_service=BlockBlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)
blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,LOCALFILENAME)
t2=time.time()
print(("It takes %s seconds to download "+blobname) % (t2 - t1))

# LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)

有關更多詳細信息,請參見此處


如果您想直接進行轉換,代碼將有所幫助。 您需要從 Blob object 獲取內容,並且在get_blob_to_text中不需要本地文件名。

from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME).content
df = pd.read_csv(StringIO(blobstring))
import pandas as pd
data = pd.read_csv('blob_sas_url')

The Blob SAS Url can be found by right clicking on the azure portal's blob file that you want to import and selecting Generate SAS. Then, click Generate SAS token and URL button and copy the SAS url to above code in place of blob_sas_url.

您現在可以直接從 BlobStorage 讀取到 Pandas DataFrame:

mydata = pd.read_csv(
        f"abfs://{blob_path}",
        storage_options={
            "connection_string": os.environ["STORAGE_CONNECTION"]
    })

其中blob_path是文件的路徑,以{container-name}/{blob-preifx.csv}

作為 azure-storage 一部分的 BlockBlobService 已棄用。 請改用以下內容:

!pip install azure-storage-blob
from azure.storage.blob import BlobServiceClient
import pandas as pd

STORAGEACCOUNTURL= <storage_account_url>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

#download from blob
blob_service_client_instance=BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
blob_client_instance = blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME, snapshot=None)
with open(LOCALFILENAME, "wb") as my_blob:
    blob_data = blob_client_instance.download_blob()
    blob_data.readinto(my_blob)

#import blob to dataframe
df = pd.read_csv(DataFrame)

LOCALFILENAME 與 BLOBNAME 相同

BlockBlobService 確實已被棄用。 但是,@Deepak 的回答對我不起作用。 以下作品:

import pandas as pd
from io import BytesIO
from azure.storage.blob import BlobServiceClient

CONNECTION_STRING= <connection_string>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = container_client.get_blob_client(BLOBNAME)

with BytesIO() as input_blob:
    blob_client.download_blob().download_to_stream(input_blob)
    input_blob.seek(0)
    df = pd.read_csv(input_blob)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM