简体   繁体   English

Azure Function Python 写入 Azure DataLake Gen2

[英]Azure Function Python write to Azure DataLake Gen2

I want to write a file to my Azure DataLake Gen2 with an Azure Function and Python.我想用 Azure Function 和 Python 将文件写入我的 Azure DataLake Gen2。

Unfortunately I'm having the following authentication issue:不幸的是,我遇到了以下身份验证问题:

Exception: ClientAuthenticationError: (InvalidAuthenticationInfo) Server failed to authenticate the request.异常:ClientAuthenticationError:(InvalidAuthenticationInfo)服务器无法验证请求。 Please refer to the information in the www-authenticate header.请参考www-authenticate header中的信息。

'WWW-Authenticate': 'REDACTED' 'WWW-Authenticate': '已编辑'

Both my account and the Function app should have the necessary roles for accessing my DataLake assigned.我的帐户和 Function 应用程序都应该具有访问我分配的 DataLake 的必要角色。

And here is my function:这是我的 function:

import datetime
import logging

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func

def main(mytimer: func.TimerRequest) -> None:
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    if mytimer.past_due:
        logging.info('The timer is past due!')

    credential = DefaultAzureCredential()
    service_client = DataLakeServiceClient(account_url="https://<datalake_name>.dfs.core.windows.net", credential=credential)

    file_system_client = service_client.get_file_system_client(file_system="temp")
    directory_client = file_system_client.get_directory_client("test")
    file_client = directory_client.create_file("uploaded-file.txt")
    
    file_contents = 'some data'
    file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
    file_client.flush_data(len(file_contents))


    logging.info('Python timer trigger function ran at %s', utc_timestamp)

What am I missing?我错过了什么?

THX & BR谢谢

Peter彼得

The problem seems come from the DefaultAzureCredential.问题似乎来自 DefaultAzureCredential。

The identity of DefaultAzureCredential uses depends on the environment. DefaultAzureCredential 使用的标识取决于环境。 When an access token is needed, it requests one using these identities in turn, stopping when one provides a token:当需要访问令牌时,它会依次请求使用这些身份的人,并在有人提供令牌时停止:

1. A service principal configured by environment variables. 
2. An Azure managed identity. 
3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio.
4. The user currently signed in to Visual Studio Code.
5. The identity currently logged in to the Azure CLI.

In fact, you can completely generate datalake service objects without using the default credentials.事实上,您完全可以在不使用默认凭据的情况下生成数据湖服务对象。 You can do this (connect directly using the connection string):您可以这样做(使用连接字符串直接连接):

import logging
import datetime

from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func


def main(req: func.HttpRequest) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    service_client = DataLakeServiceClient.from_connection_string(connect_str)

    file_system_client = service_client.get_file_system_client(file_system="test")
    directory_client = file_system_client.get_directory_client("test")
    file_client = directory_client.create_file("uploaded-file.txt")
    
    file_contents = 'some data'
    file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
    file_client.flush_data(len(file_contents))

    return func.HttpResponse(
            "Test.",
            status_code=200
    )

In addition, in order to ensure smooth data writing, please check whether your datalake has access restrictions.另外,为保证数据写入顺利,请检查您的datalake是否有访问限制。

The function suggested by Bowman Zhu contains an error. Bowman Zhu建议的function有错误。 According to the Azure documentation the parameter "length" expects length in bytes.根据Azure 文档,参数“长度”需要以字节为单位的长度。 However, the suggested function uses length in characters.但是,建议的 function 使用字符长度。 Some of these characters may consist of multiple bytes.其中一些字符可能由多个字节组成。 In such cases the function will not write all bytes of file_contents to the file, and thus cause data loss!在这种情况下,function 不会将 file_contents 的所有字节写入文件,从而导致数据丢失!

Therefore,所以,

file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))

must be something like:必须是这样的:

length = len(file_contents.encode())
file_client.append_data(data=file_contents, offset=0, length=length)
file_client.flush_data(offset=length)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 Azure Databricks 中的 Azure Datalake Gen2 读取 .nc 文件 - Read .nc files from Azure Datalake Gen2 in Azure Databricks 我可以使用Python SDK访问来自Azure Datalake Gen2的数据吗? - Can I use Python SDK to access data from azure datalake gen2? 无法使用 python azure-storage-file-datalake SDK 在 Azure Data Lake Gen2 中创建 Append Blob - Cannot create Append Blobs in Azure Data Lake Gen2 using python azure-storage-file-datalake SDK 如何在不下载的情况下直接访问 Azure datalake gen2 中存在的 .txt 文件 - How can I access a .txt file which is present in Azure datalake gen2 directly without downloading 创建范围以从 Databricks 访问 Azure Datalake Gen2 时出现属性错误 - Attribute error while creating scope to access Azure Datalake Gen2 from Databricks Azure 功能和 DataLake gen 2 连接 - Azure Functions and DataLake gen 2 connection 使用 Python(无 ADB)读取 Azure ADLS Gen2 文件 - Azure ADLS Gen2 File read using Python (without ADB) 使用 Azure CLI、Rest API 或 Python 在 Azure ADLS gen2 中复制文件 - Copy files within Azure ADLS gen2 using Azure CLI, Rest API or Python Azure function write_dataframe_to_datalake 在 python 在 VScode 上工作正常,但在部署到云时失败 - Azure function write_dataframe_to_datalake in python working fine on VScode but fails when deployed to the cloud 直接在Azure Datalake中将Python Dataframe写入CSV文件 - Write Python Dataframe to CSV file directly in Azure Datalake
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM