简体   繁体   English

从“输入”blob 存储容器中读取 excel 文件,并使用 python 导出到“输出”容器中的 csv

[英]Reading excel files from "input" blob storage container and exporting to csv in "output" container with python

I'm trying to develop a script in python to read a file in .xlsx from a blob storage container called "source", convert it in .csv and store it in a new container (I'm testing the script locally, if working I should include it in an ADF pipeline).我正在尝试在 python 中开发一个脚本,以从名为“源”的 blob 存储容器中读取.xlsx中的文件,将其转换为.csv并将其存储在一个新容器中(如果工作,我正在本地测试脚本我应该将它包含在 ADF 管道中)。 So far, I managed to access to the blob storage, but I'm having problems in reading the file content.到目前为止,我设法访问了 blob 存储,但在读取文件内容时遇到了问题。

from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
import pandas as pd

conn_str = "DefaultEndpointsProtocol=https;AccountName=XXXXXX;AccountKey=XXXXXX;EndpointSuffix=core.windows.net"
container = "source"
blob_name = "prova.xlsx"

container_client = ContainerClient.from_connection_string(
    conn_str=conn_str, 
    container_name=container
    )
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)

df = pd.read_excel(downloaded_blob)

print(df)

I get following error:我收到以下错误:

ValueError: Invalid file path or buffer object type: <class 'azure.storage.blob._download.StorageStreamDownloader'> ValueError:无效的文件路径或缓冲区 object 类型:<class 'azure.storage.blob._download.StorageStreamDownloader'>

I tried with a .csv file as input and writing the parsing code as follows:我尝试使用.csv文件作为输入并编写如下解析代码:

df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )

and it works.它有效。

Any suggestion on how to modify the code so that the excel file becomes readable?关于如何修改代码以使 excel 文件变得可读的任何建议?

I summary the solution as below.我将解决方案总结如下。

When we use the method pd.read_excel() in sdk pandas , we need to provide bytes as input.当我们在 sdk pandas中使用pd.read_excel()方法时,我们需要提供字节作为输入。 But when we use download_blob to download the excel file from azure blob, we just get azure.storage.blob.StorageStreamDownloader .但是当我们使用download_blob从 azure blob 下载 excel 文件时,我们只得到azure.storage.blob.StorageStreamDownloader So we need to use the method readall() or content_as_bytes() to convert it to bytes.所以我们需要使用方法readall()content_as_bytes()将其转换为字节。 For more details, please refer to the document and the document更多详细信息,请参阅文档文档

Change改变

df = pd.read_excel(downloaded_blob)

to

df = pd.read_excel(downloaded_blob.content_as_bytes())

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否可以从 Azure Blob Storage 容器中读取所有文件,并在使用 Python 读取后删除文件? - Is it possible to read in all the files from an Azure Blob Storage container, and deleting the files after reading with Python? 使用 Python 从 azure blob 存储下载文件(csv、excel) - Download files (csv, excel) from azure blob storage using Python 使用 Azure-Storage-Blob Python 读取 Blob 容器目录中每个 Blob 的文件大小 - Reading the File size for each blob inside a directory of a Blob Container using Azure-Storage-Blob Python 如何使用 Python 从给定 SAS URI 和容器名称的 Azure Blob 存储下载文件列表? - How to download a list of files from Azure Blob Storage given SAS URI and container name using Python? 时间触发的 Python 中的 Azure 函数从 url 获取 zip 文件,解压缩,然后将文件输出到 Azure 存储中的 blob 容器 - Azure Function in Python that is on a time trigger get zip file from a url, unzip, then output file to blob container in Azure storage 从 Azure blob 容器读取 NLP 模型 - Reading NLP models from Azure blob container 使用 Databricks PySpark 从 Azure blob 存储读取多个 CSV 文件 - Reading multiple CSV files from Azure blob storage using Databricks PySpark 如何使用 Azure Blob 存储 SDK 将 Blob 从一个容器复制到另一个容器 - How to copy a blob from one container to another container using Azure Blob storage SDK 在Python中的Azure Blob存储容器上应用SAS权限 - Apply SAS Permissions on Azure Blob Storage Container in Python 使用Python获取Azure Blob存储中的容器大小 - Get container sizes in Azure Blob Storage using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM