简体   繁体   English

使用Python或Java从本地将数据上传到Azure ADLS Gen2

[英]Upload data to the Azure ADLS Gen2 from on-premise using Python or Java

I have an Azure Storage account with Data Lake Gen2. 我在Data Lake Gen2中拥有一个Azure存储帐户。 I would like to upload data from on-premise to the Lake Gen2 file systems using Python (or Java). 我想使用Python(或Java)将数据从本地上传到Lake Gen2文件系统。

I have found examples on how to interact with File Shares in the Storage account, yet I could not yet find out how to upload to the Lake (instead of the File Share). 我已经在存储帐户中找到了有关如何与文件共享进行交互的示例 ,但是我仍然找不到如何上传到Lake(而不是文件共享)的示例 I have also found out how to do it for Gen1 Lakes here , but nothing except closed requests for Gen2. 我还发现了如何在这里为Gen1 Lakes做到这一点 ,但是除了Gen2的封闭请求之外,什么都没有。

My question is whether this is even possible with Python as of today; 我的问题是,到今天为止,是否可以用Python做到这一点? alternatively, how can I upload files to the Gen2 Lake using Java? 或者,如何使用Java将文件上传到Gen2 Lake? A code snippet demonstrating the API calls for the upload would be highly appreciated. 演示用于上传的API调用的代码片段将受到高度赞赏。

According to the offical tutorial Quickstart: Upload, download, and list blobs with Python , as below, you can not directly use Azure Storage SDK for Python to do any operations in Azure Data Lake Store Gen 2 if you have not enrolled in the public preview of multi-protocol access on Data Lake Storage . 根据官方教程Quickstart: Upload, download, and list blobs with Python ,如下所示,如果尚未注册公共预览版,则不能直接使用适用于Python的Azure存储SDK在Azure Data Lake Store Gen 2中进行任何操作协议对Data Lake Storage的访问

Note 注意

The features described in this article are available to accounts that have a hierarchical namespace only if you enroll in the public preview of multi-protocol access on Data Lake Storage . 仅当您在Data Lake Storage上注册多协议访问的公共预览时,本文中介绍的功能才可用于具有分层名称空间的帐户。 To review limitations, see the known issues article. 要查看限制,请参阅已知问题文章。

So the only solution to upload data to ADLS Gen2 is to use the REST APIs of ADLS Gen2, please refer to its reference Azure Data Lake Store REST API . 因此,将数据上传到ADLS Gen2的唯一解决方案是使用ADLS Gen2的REST API,请参考其参考Azure Data Lake Store REST API

Here is my sample code to upload data to ADLS Gen2 in Python, and it works fine. 这是我的示例代码,可以使用Python将数据上传到ADLS Gen2,并且工作正常。

import requests
import json

def auth(tenant_id, client_id, client_secret):
    print('auth')
    auth_headers = {
        "Content-Type": "application/x-www-form-urlencoded"
    }
    auth_body = {
        "client_id": client_id,
        "client_secret": client_secret,
        "scope" : "https://storage.azure.com/.default",
        "grant_type" : "client_credentials"
    }
    resp = requests.post(f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token", headers=auth_headers, data=auth_body)
    return (resp.status_code, json.loads(resp.text))

def mkfs(account_name, fs_name, access_token):
    print('mkfs')
    fs_headers = {
        "Authorization": f"Bearer {access_token}"
    }
    resp = requests.put(f"https://{account_name}.dfs.core.windows.net/{fs_name}?resource=filesystem", headers=fs_headers)
    return (resp.status_code, resp.text)

def mkdir(account_name, fs_name, dir_name, access_token):
    print('mkdir')
    dir_headers = {
        "Authorization": f"Bearer {access_token}"
    }
    resp = requests.put(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{dir_name}?resource=directory", headers=dir_headers)
    return (resp.status_code, resp.text)

def touch_file(account_name, fs_name, dir_name, file_name, access_token):
    print('touch_file')
    touch_file_headers = {
        "Authorization": f"Bearer {access_token}"
    }
    resp = requests.put(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{dir_name}/{file_name}?resource=file", headers=touch_file_headers)
    return (resp.status_code, resp.text)

def append_file(account_name, fs_name, path, content, position, access_token):
    print('append_file')
    append_file_headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "text/plain",
        "Content-Length": f"{len(content)}"
    }
    resp = requests.patch(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{path}?action=append&position={position}", headers=append_file_headers, data=content)
    return (resp.status_code, resp.text)

def flush_file(account_name, fs_name, path, position, access_token):
    print('flush_file')
    flush_file_headers = {
        "Authorization": f"Bearer {access_token}"
    }
    resp = requests.patch(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{path}?action=flush&position={position}", headers=flush_file_headers)
    return (resp.status_code, resp.text)

def mkfile(account_name, fs_name, dir_name, file_name, local_file_name, access_token):
    print('mkfile')
    status_code, result = touch_file(account_name, fs_name, dir_name, file_name, access_token)
    if status_code == 201:
        with open(local_file_name, 'rb') as local_file:
            path = f"{dir_name}/{file_name}"
            content = local_file.read()
            position = 0
            append_file(account_name, fs_name, path, content, position, access_token)
            position = len(content)
            flush_file(account_name, fs_name, path, position, access_token)
    else:
        print(result)


if __name__ == '__main__':
    tenant_id = '<your tenant id>'
    client_id = '<your client id>'
    client_secret = '<your client secret>'

    account_name = '<your adls account name>'
    fs_name = '<your filesystem name>'
    dir_name = '<your directory name>'
    file_name = '<your file name>'
    local_file_name = '<your local file name>'

    # Acquire an Access token
    auth_status_code, auth_result = auth(tenant_id, client_id, client_secret)
    access_token = auth_status_code == 200 and auth_result['access_token'] or ''
    print(access_token)

    # Create a filesystem
    mkfs_status_code, mkfs_result = mkfs(account_name, fs_name, access_token)
    print(mkfs_status_code, mkfs_result)

    # Create a directory
    mkdir_status_code, mkdir_result = mkdir(account_name, fs_name, dir_name, access_token)
    print(mkdir_status_code, mkdir_result)

    # Create a file from local file
    mkfile(account_name, fs_name, dir_name, file_name, local_file_name, access_token)

Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Java 获取 Azure Data Lake Gen2 中的文件夹大小 - Obtain Folder size in Azure Data Lake Gen2 using Java ADLS Gen2 中文件的 Java 文件对象 - Java File Object for a File in ADLS Gen2 SQL Polybase 可以从 Azure datalake gen2 读取数据吗? - Can SQL Polybase read data from Azure datalake gen2? 如何使用 java sdk 从 azure adls gen 1 下载文件到我的本地? - How to download a file from azure adls gen 1 to my local using java sdk? 如何使用 java sdk 从我的语言环境上传和下载文件到 azure adls? - How to upload and download a file from my locale to azure adls using java sdk? 如何使用 Java 函数存储数据湖 gen2? - How to store data lake gen2 with Java Functions? 寻找 REST API 以列出 Azure Data Lake Gen2 存储的所有容器 - Looking for REST API to list all Containers of Azure Data Lake Gen2 Storage 如何使用租户 ID、客户端 ID 和客户端机密连接和管理 Azure Data Lake Storage Gen2 中的目录和文件? - How can I use tenant id, client id and client secret to connect to and manage directories and files in Azure Data Lake Storage Gen2? 将 Java SCP 应用程序连接到内部部署 JCo3 的 ABAP - Connecting Java SCP app to ABAP on-premise JCo3 使用Java连接到Microsoft Dynamics CRM内部部署Web服务? - Connecting to Microsoft Dynamics CRM on-premise web service with Java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM