繁体   English   中英

如何从 azure blob 存储中读取镶木地板文件(大尺寸 1 GB)而无需在本地计算机上下载

[英]how to read parquet file(large size 1 GB) from the azure blob storage without downloading in local machine

从 azure.storage.blob 导入 BlobServiceClient blob_service_client = BlobServiceClient.from_connection_string(connection_string)

blob_client = blob_service_client.get_blob_client(container="ABC", blob="/xylem/pr/folder_with_parquet_files")

下面是 python 代码,我做了一个重现以从 Azure blob 存储中读取镶木地板文件:

import logging
import sys
import os
import pandas as pd
import pyarrow as py
import azure.functions as func
from io import BytesIO
from azure.storage import blob
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__

def main():
    CONN_STR = "STORAGE_CONNECION_STRING"
    blob_service_client = BlobServiceClient.from_connection_string(CONN_STR)
    # MAP SOURCE FILE
    blob_client = blob_service_client.get_blob_client(container="parquetfiles", blob="userdata1.parquet")
    content =  blob_client.download_blob()
    stream = BytesIO()
    content.readinto(stream)
    processed_df = pd.read_parquet(stream, engine='pyarrow')
    print(processed_df)

if __name__ == "__main__":
    main()

看看下面来自我本地 VS-Code 的测试 output 截图:

Output

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM