[英]Azure Function Python write to Azure DataLake Gen2
I want to write a file to my Azure DataLake Gen2 with an Azure Function and Python.我想用 Azure Function 和 Python 将文件写入我的 Azure DataLake Gen2。
Unfortunately I'm having the following authentication issue:不幸的是,我遇到了以下身份验证问题:
Exception: ClientAuthenticationError: (InvalidAuthenticationInfo) Server failed to authenticate the request.
异常:ClientAuthenticationError:(InvalidAuthenticationInfo)服务器无法验证请求。 Please refer to the information in the www-authenticate header.
请参考www-authenticate header中的信息。
'WWW-Authenticate': 'REDACTED' 'WWW-Authenticate': '已编辑'
Both my account and the Function app should have the necessary roles for accessing my DataLake assigned.我的帐户和 Function 应用程序都应该具有访问我分配的 DataLake 的必要角色。
And here is my function:这是我的 function:
import datetime
import logging
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url="https://<datalake_name>.dfs.core.windows.net", credential=credential)
file_system_client = service_client.get_file_system_client(file_system="temp")
directory_client = file_system_client.get_directory_client("test")
file_client = directory_client.create_file("uploaded-file.txt")
file_contents = 'some data'
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
logging.info('Python timer trigger function ran at %s', utc_timestamp)
What am I missing?我错过了什么?
THX & BR谢谢
Peter彼得
The problem seems come from the DefaultAzureCredential.问题似乎来自 DefaultAzureCredential。
The identity of DefaultAzureCredential uses depends on the environment. DefaultAzureCredential 使用的标识取决于环境。 When an access token is needed, it requests one using these identities in turn, stopping when one provides a token:
当需要访问令牌时,它会依次请求使用这些身份的人,并在有人提供令牌时停止:
1. A service principal configured by environment variables.
2. An Azure managed identity.
3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio.
4. The user currently signed in to Visual Studio Code.
5. The identity currently logged in to the Azure CLI.
In fact, you can completely generate datalake service objects without using the default credentials.事实上,您完全可以在不使用默认凭据的情况下生成数据湖服务对象。 You can do this (connect directly using the connection string):
您可以这样做(使用连接字符串直接连接):
import logging
import datetime
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
service_client = DataLakeServiceClient.from_connection_string(connect_str)
file_system_client = service_client.get_file_system_client(file_system="test")
directory_client = file_system_client.get_directory_client("test")
file_client = directory_client.create_file("uploaded-file.txt")
file_contents = 'some data'
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
return func.HttpResponse(
"Test.",
status_code=200
)
In addition, in order to ensure smooth data writing, please check whether your datalake has access restrictions.另外,为保证数据写入顺利,请检查您的datalake是否有访问限制。
The function suggested by Bowman Zhu contains an error. Bowman Zhu建议的function有错误。 According to the Azure documentation the parameter "length" expects length in bytes.
根据Azure 文档,参数“长度”需要以字节为单位的长度。 However, the suggested function uses length in characters.
但是,建议的 function 使用字符长度。 Some of these characters may consist of multiple bytes.
其中一些字符可能由多个字节组成。 In such cases the function will not write all bytes of file_contents to the file, and thus cause data loss!
在这种情况下,function 不会将 file_contents 的所有字节写入文件,从而导致数据丢失!
Therefore,所以,
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
must be something like:必须是这样的:
length = len(file_contents.encode())
file_client.append_data(data=file_contents, offset=0, length=length)
file_client.flush_data(offset=length)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.