简体   繁体   English

Azure Blob - 使用 Python 读取

[英]Azure Blob - Read using Python

Can someone tell me if it is possible to read a csv file directly from Azure blob storage as a stream and process it using Python?有人能告诉我是否可以直接从 Azure blob 存储中读取 csv 文件作为流并使用 Python 对其进行处理吗? I know it can be done using C#.Net (shown below) but wanted to know the equivalent library in Python to do this.我知道它可以使用 C#.Net(如下所示)来完成,但想知道 Python 中的等效库来执行此操作。

CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("outfiles");
CloudBlob blob = container.GetBlobReference("Test.csv");*

Yes, it is certainly possible to do so.是的,当然可以这样做。 Check out Azure Storage SDK for Python查看Azure Storage SDK for Python

from azure.storage.blob import BlockBlobService

block_blob_service = BlockBlobService(account_name='myaccount', account_key='mykey')

block_blob_service.get_blob_to_path('mycontainer', 'myblockblob', 'out-sunset.png')

You can read the complete SDK documentation here: http://azure-storage.readthedocs.io .您可以在此处阅读完整的 SDK 文档: http : //azure-storage.readthedocs.io

Here's a way to do it with the new version of the SDK (12.0.0):以下是使用新版 SDK (12.0.0) 执行此操作的方法:

from azure.storage.blob import BlobClient

blob = BlobClient(account_url="https://<account_name>.blob.core.windows.net"
                  container_name="<container_name>",
                  blob_name="<blob_name>",
                  credential="<account_key>")

with open("example.csv", "wb") as f:
    data = blob.download_blob()
    data.readinto(f)

See here for details.有关详细信息,请参见此处

Provide Your Azure subscription Azure storage name and Secret Key as Account Key here在此处提供您的 Azure 订阅 Azure 存储名称和密钥作为帐户密钥

block_blob_service = BlockBlobService(account_name='$$$$$$', account_key='$$$$$$')

This still get the blob and save in current location as 'output.jpg'这仍然得到 blob 并在当前位置保存为“output.jpg”

block_blob_service.get_blob_to_path('you-container_name', 'your-blob', 'output.jpg')

This will get text/item from blob这将从 blob 获取文本/项目

blob_item= block_blob_service.get_blob_to_bytes('your-container-name','blob-name')

    blob_item.content

One can stream from blob with python like this:可以像这样使用 python 从 blob 中流式传输:

from tempfile import NamedTemporaryFile
from azure.storage.blob.blockblobservice import BlockBlobService

entry_path = conf['entry_path']
container_name = conf['container_name']
blob_service = BlockBlobService(
            account_name=conf['account_name'],
            account_key=conf['account_key'])

def get_file(filename):
    local_file = NamedTemporaryFile()
    blob_service.get_blob_to_stream(container_name, filename, stream=local_file, 
    max_connections=2)

    local_file.seek(0)
    return local_file

I recommend using smart_open .我建议使用smart_open

from smart_open import open

# stream from Azure Blob Storage
with open('azure://my_container/my_file.txt') as fin:
    for line in fin:
        print(line)

# stream content *into* Azure Blob Storage (write mode):
with open('azure://my_container/my_file.txt', 'wb') as fout:
    fout.write(b'hello world')

这是使用 Blob 中的 Pandas 读取 CSV 的简单方法:

service_client = BlobServiceClient.from_connection_string(os.environ['AZURE_STORAGE_CONNECTION_STRING'])

client = service_client.get_container_client("your_container")

bc = client.get_blob_client(blob="your_folder/yourfile.csv")

with open("yourfile.csv", 'wb') as file:

data = bc.download_blob()

file.write(data.readall())

volantino_df = pd.read_csv("yourfile.csv")

I know this is an old post but if someone wants to do the same.我知道这是一个旧帖子,但如果有人想这样做。 I was able to access as per below codes我能够按照以下代码访问

Note: you need to set the AZURE_STORAGE_CONNECTION_STRING which can be obtained from Azure Portal -> Go to your storage -> Settings -> Access keys and then you will get the connection string there.注意:您需要设置可以从 Azure 门户获取的 AZURE_STORAGE_CONNECTION_STRING -> 转到您的存储 -> 设置 -> 访问密钥,然后您将在那里获得连接字符串。

For Windows: setx AZURE_STORAGE_CONNECTION_STRING ""对于 Windows:setx AZURE_STORAGE_CONNECTION_STRING ""

For Linux: export AZURE_STORAGE_CONNECTION_STRING=""对于 Linux:导出 AZURE_STORAGE_CONNECTION_STRING=""

For macOS: export AZURE_STORAGE_CONNECTION_STRING=""对于 macOS:导出 AZURE_STORAGE_CONNECTION_STRING=""

import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__

connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
print(connect_str)
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client("Your Storage Name Here")
try:

    print("\nListing blobs...")

    # List the blobs in the container
    blob_list = container_client.list_blobs()
    for blob in blob_list:
        print("\t" + blob.name)

except Exception as ex:
    print('Exception:')
    print(ex)

Since I wasn't able to find what I needed on this thread, I wanted to follow up on @SebastianDziadzio's answer to retrieve the data without downloading it as a local file, which is what I was trying to find for myself.由于我无法在此线程上找到所需的内容,因此我想跟进 @SebastianDziadzio 的回答以检索数据而不将其作为本地文件下载,这是我试图为自己找到的。

Replace the with statement with the following:with语句替换为以下内容:

from io import BytesIO
import pandas as pd

with BytesIO() as input_blob:
    blob_client_instance.download_blob().download_to_stream(input_blob)
    input_blob.seek(0)
    df = pd.read_csv(input_blob, compression='infer', index_col=0)

I struggled lot for this I don't want anyone to do same, If you are using openpyxl and want to directly write from azure function to blob storage do following steps and you will achieve what you are seeking for.我为此付出了很多努力我不希望任何人做同样的事情,如果您使用的是 openpyxl 并且想直接从 azure 函数写入 blob 存储,请执行以下步骤,您将实现您想要的。

Thanks.谢谢。 HMU if you need anyhelp.如果您需要任何帮助,请联系 HMU。

blob = BlobClient.from_connection_string(conn_str=conString, container_name=container_name, blob_name=r'YOUR_PATH/test1.xlsx') blob.upload_blob(save_virtual_workbook(wb)) blob = BlobClient.from_connection_string(conn_str=conString, container_name=container_name, blob_name=r'YOUR_PATH/test1.xlsx') blob.upload_blob(save_virtual_workbook(wb))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM