简体   繁体   中英

AZURE Function read XLSX from AZURE BLOB

I want to utilize an AZURE Function app to read an XLSX File from an AZURE BLOB Storage. The Function shall be called by a REST API call. I can access the blob and download the file but I'm struggling with reading the content of the file with pandas directly. I'm searching and trying for hours but can't find a solution. My latest approach looks like this:

def main(req: func.HttpRequest) -> func.HttpResponse:
        logging.info('Python HTTP trigger function processed a request.')

        blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
        container_client = blob_service_client.get_container_client(CONTAINERNAME)
        blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)
        blob = BlobClient(ACCOUNT_URL, CONTAINERNAME, BLOBNAME)

        #READ PRODUCTS FILE
        blob_client.download_blob().readinto(LOCALFILENAME)
        df = pd.read_excel(blob_client.download_blob())

On the MS homepage*, there is one example to download a file from a blob and process it afterwards but as I'm utilizing a function app it doesn't make sense to download the file first if I'm not missing anything...

* https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

The autocomplete for blob gives me the following options but they are also not really helpful: Blob 的自动完成建议

The error messages are different depending on the way I try to read the file. The current one is:

System.Private.CoreLib: Exception while executing function: Functions.TrainProductModel. System.Private.CoreLib: Result: Failure
Exception: AttributeError: 'str' object has no attribute 'write'

But I think there is actually something fundamentally wrong with my approach. The desired result is to read the file directly into a pandas table.

Appreciating any support as this blocking progress with my master thesis:/

Pandas itself does not have the ability to parse xlsx files. Pandas parses xlsx files based on the external library xlrd. And you shouldn't install the high version of xlrd, because the high version cancels the support for xlsx files (only xls files are supported), and the recommended version is 1.2.0 (this is valid for me).

Below is my code:

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd

CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
CONTAINERNAME = "test"
BLOBNAME = "test.xlsx"
LOCALFILENAME = "testx.xlsx" 

blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)

#READ PRODUCTS FILE
f = open(LOCALFILENAME, "wb")
f.write(blob_client.download_blob().content_as_bytes())
f.close()
df = pd.read_excel(r''+LOCALFILENAME)
print(df)

And it works on my side:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM