简体   繁体   English

Azure function 绑定 Azure 数据湖(python)

[英]Azure function binding for Azure data lake (python)

I am having a requirement like I want to connect to my Azure data lake v2(ADLS) from Azure functions, read file, process it using python(pyspark) and write it again in Azure data lake.我有一个要求,比如我想从 Azure 函数连接到我的 Azure 数据湖 v2(ADLS),读取文件,使用 python(pyspark) 处理它并在 Azure 数据湖中再次写入。 So my input and output binding would be to ADLS.所以我的输入和 output 绑定到 ADLS。 Is there any ADLS binding for Azure function in python available? python中Azure function是否有ADLS绑定可用? Could somebody give any suggestions on this?有人可以对此提出任何建议吗?

Thank, Anten D谢谢,Anten D


1, When we read the data, we can use blob input binding. 1、我们在读取数据的时候,可以使用blob输入绑定。

2, But when we write the data, we can not use blob output binding.(This is because the object is different.) And azure function not support ADLS output binding so we need to put the logic code in the body of the function when we want to write the code. 2、但是我们在写数据的时候,不能使用blob output绑定。(这是因为object不一样。)而azure function不支持ADLS output绑定所以我们需要把逻辑代码放在848638908的body中我们要编写代码。

This is the doc of what kind of binding that azure function can support:这是azure function可以支持什么样的绑定的文档:

https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings?tabs=csharp#supported-bindings https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings?tabs=csharp#supported-bindings

Below is a simply code example:下面是一个简单的代码示例:

import logging

import azure.functions as func
from azure.storage.filedatalake import DataLakeServiceClient

def main(req: func.HttpRequest, inputblob: func.InputStream) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
    myfilesystem = "test"
    myfile       = "FileName.txt"
    file_system_client = datalake_service_client.get_file_system_client(myfilesystem)    
    file_client = file_system_client.create_file(myfile)
    inputstr = inputblob.read().decode("utf-8")
    print("length of data is "+str(len(inputstr)))
    filesize_previous = 0
    print("length of currentfile is "+str(filesize_previous))
    file_client.append_data(inputstr, offset=filesize_previous, length=len(inputstr))
    return func.HttpResponse(
            "This is a test."+inputstr,

Original Answer:原答案:

I think below doc will helps you:我认为下面的文档会帮助你:

How to read:如何阅读:

https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=csharp https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=csharp

How to write:怎么写:

https://learn.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python https://learn.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python

By the way, don't use blob's output binding.顺便说一下,不要使用 blob 的 output 绑定。 Reading can be achieved with binding, but writing cannot.(Blob Storage Service and Datalake Service are based on different objects. Although using blob input binding to read files is completely fine, please do not use blob output binding to write files, because it does not create an object based on Datalake Service.)读取可以通过绑定实现,写入不能。(Blob Storage Service和Datalake Service基于不同的对象。虽然使用blob输入绑定读取文件是完全没问题的,但是请不要使用blob output绑定写入文件,因为它确实不要基于 Datalake 服务创建 object。)

Let me know whether above doc can helps you, if not I will update a simple python example.让我知道上面的文档是否可以帮助你,如果没有我会更新一个简单的 python 示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM