简体   繁体   English

AZURE Function 从 AZURE BLOB 读取 XLSX

[英]AZURE Function read XLSX from AZURE BLOB

I want to utilize an AZURE Function app to read an XLSX File from an AZURE BLOB Storage.我想使用 AZURE Function 应用程序从 AZURE BLOB 存储中读取 XLSX 文件。 The Function shall be called by a REST API call. Function 应由 REST API 调用调用。 I can access the blob and download the file but I'm struggling with reading the content of the file with pandas directly.我可以访问 blob 并下载文件,但我很难直接使用 pandas 读取文件的内容。 I'm searching and trying for hours but can't find a solution.我正在搜索和尝试几个小时,但找不到解决方案。 My latest approach looks like this:我的最新方法如下所示:

def main(req: func.HttpRequest) -> func.HttpResponse:
        logging.info('Python HTTP trigger function processed a request.')

        blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
        container_client = blob_service_client.get_container_client(CONTAINERNAME)
        blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)
        blob = BlobClient(ACCOUNT_URL, CONTAINERNAME, BLOBNAME)

        #READ PRODUCTS FILE
        blob_client.download_blob().readinto(LOCALFILENAME)
        df = pd.read_excel(blob_client.download_blob())

On the MS homepage*, there is one example to download a file from a blob and process it afterwards but as I'm utilizing a function app it doesn't make sense to download the file first if I'm not missing anything...在 MS 主页* 上,有一个示例可以从 blob 下载文件并随后对其进行处理,但由于我使用的是 function 应用程序,因此如果我没有遗漏任何内容,则首先下载文件是没有意义的。 .

* https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python * https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

The autocomplete for blob gives me the following options but they are also not really helpful: blob 的自动完成功能为我提供了以下选项,但它们也不是很有帮助: Blob 的自动完成建议

The error messages are different depending on the way I try to read the file.根据我尝试读取文件的方式,错误消息会有所不同。 The current one is:目前的一个是:

System.Private.CoreLib: Exception while executing function: Functions.TrainProductModel. System.Private.CoreLib: Result: Failure
Exception: AttributeError: 'str' object has no attribute 'write'

But I think there is actually something fundamentally wrong with my approach.但我认为我的方法实际上存在根本性的问题。 The desired result is to read the file directly into a pandas table.期望的结果是将文件直接读入 pandas 表中。

Appreciating any support as this blocking progress with my master thesis:/感谢任何支持,因为我的硕士论文阻碍了这一进展:/

Pandas itself does not have the ability to parse xlsx files. Pandas 本身不具备解析 xlsx 文件的能力。 Pandas parses xlsx files based on the external library xlrd. Pandas基于外部库xlrd解析xlsx文件。 And you shouldn't install the high version of xlrd, because the high version cancels the support for xlsx files (only xls files are supported), and the recommended version is 1.2.0 (this is valid for me).而且你不应该安装高版本的xlrd,因为高版本取消了对xlsx文件的支持(只支持xls文件),推荐的版本是1.2.0(这个对我有效)。

Below is my code:下面是我的代码:

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd

CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
CONTAINERNAME = "test"
BLOBNAME = "test.xlsx"
LOCALFILENAME = "testx.xlsx" 

blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)

#READ PRODUCTS FILE
f = open(LOCALFILENAME, "wb")
f.write(blob_client.download_blob().content_as_bytes())
f.close()
df = pd.read_excel(r''+LOCALFILENAME)
print(df)

And it works on my side:它对我有用:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM