[英]Reading excel files from "input" blob storage container and exporting to csv in "output" container with python
I'm trying to develop a script in python to read a file in .xlsx from a blob storage container called "source", convert it in .csv and store it in a new container (I'm testing the script locally, if working I should include it in an ADF pipeline).我正在尝试在 python 中开发一个脚本,以从名为“源”的 blob 存储容器中读取.xlsx中的文件,将其转换为.csv并将其存储在一个新容器中(如果工作,我正在本地测试脚本我应该将它包含在 ADF 管道中)。 So far, I managed to access to the blob storage, but I'm having problems in reading the file content.
到目前为止,我设法访问了 blob 存储,但在读取文件内容时遇到了问题。
from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
import pandas as pd
conn_str = "DefaultEndpointsProtocol=https;AccountName=XXXXXX;AccountKey=XXXXXX;EndpointSuffix=core.windows.net"
container = "source"
blob_name = "prova.xlsx"
container_client = ContainerClient.from_connection_string(
conn_str=conn_str,
container_name=container
)
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)
df = pd.read_excel(downloaded_blob)
print(df)
I get following error:我收到以下错误:
ValueError: Invalid file path or buffer object type: <class 'azure.storage.blob._download.StorageStreamDownloader'>
ValueError:无效的文件路径或缓冲区 object 类型:<class 'azure.storage.blob._download.StorageStreamDownloader'>
I tried with a .csv file as input and writing the parsing code as follows:我尝试使用.csv文件作为输入并编写如下解析代码:
df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )
and it works.它有效。
Any suggestion on how to modify the code so that the excel file becomes readable?关于如何修改代码以使 excel 文件变得可读的任何建议?
I summary the solution as below.我将解决方案总结如下。
When we use the method pd.read_excel()
in sdk pandas
, we need to provide bytes as input.当我们在 sdk
pandas
中使用pd.read_excel()
方法时,我们需要提供字节作为输入。 But when we use download_blob
to download the excel file from azure blob, we just get azure.storage.blob.StorageStreamDownloader
.但是当我们使用
download_blob
从 azure blob 下载 excel 文件时,我们只得到azure.storage.blob.StorageStreamDownloader
So we need to use the method readall()
or content_as_bytes()
to convert it to bytes.所以我们需要使用方法
readall()
或content_as_bytes()
将其转换为字节。 For more details, please refer to the document and the document更多详细信息,请参阅文档和文档
Change改变
df = pd.read_excel(downloaded_blob)
to至
df = pd.read_excel(downloaded_blob.content_as_bytes())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.