简体   繁体   中英

Trouble connecting to Azure-Datalakes-gen2 using requests module in python

I am currently trying to connect to azure datalakes-gen2 using python to grab information from json files stored inside. Hearing that the azure-datalakes module for python does not work for gen 2 (and having troubles myself), I moved on to connecting via rest-api and the requests package found in python. However, reading the references left by Microsoft along with the required authentication header had left me even more confused in what to do.

While I have a general understanding of python, I am still an amateur when it comes to more advanced projects, and always need to look things up, however this is my first time asking a question for help rather then search until I find the answer (So please Bear with me).

I found a useful link by Michal Pawlikowski explaining how to connect via powershell, and this helped explain a lot of loose ends, but still left with two problems, first being I am unsure about encoding the authentication header correctly. specifically "encode this string by using the HMAC-SHA256 algorithm over the UTF-8-encoded signature string", and second being this will only list the files found inside of the directory, not the information contained inside of the files.

Heres what I had tried


date = "Wed, 15 May 2019 14:28:01 GMT"

string_to_sign = 'GET\n\n\n\n\n\n\n\n\n\n\n\nx-ms-date:'+date+'\nx-ms-version:2018-11-09\n/'+STORAGE_ACCOUNT_NAME+'/'+FILE_SYSTEM_NAME+'\nrecursive:true\nresource:fileststem'

signature = #Encoded string_to_sign + key, am unsure how to approach

auth_header = "SharedKey "+STORAGE_ACCOUNT_NAME+":"+signature

headers = {"Authorization" : auth_header, "x-ms-version" : "2018-11-09", "x-ms-date" : date}


req = requests.get("https://"+STORAGE_ACCOUNT_NAME+".dfs.core.windows.net/" + FILE_SYSTEM_NAME + "?recursive=true&resource=filesystem", headers=headers)

I expect req.text to contain the information found inside the json file, however I will always receive a 403 error stating to make sure my headers are formatted correctly.

If you want to read file content, you should use Read api .

The code below works at my side:

import requests
import datetime
import hmac
import hashlib
import base64

storage_account_name = 'storage_account_name'
storage_account_key = 'storage_account_key'
api_version = '2018-11-09'
request_time = datetime.datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')

#the file path on adls gen2
FILE_SYSTEM_NAME='dd1/11.txt'

string_params = {
    'verb': 'GET',
    'Content-Encoding': '',
    'Content-Language': '',
    'Content-Length': '',
    'Content-MD5': '',
    'Content-Type': '',
    'Date': '',
    'If-Modified-Since': '',
    'If-Match': '',
    'If-None-Match': '',
    'If-Unmodified-Since': '',
    'Range': '',
    'CanonicalizedHeaders': 'x-ms-date:' + request_time + '\nx-ms-version:' + api_version,
    'CanonicalizedResource': '/' + storage_account_name+'/'+FILE_SYSTEM_NAME
    }

string_to_sign = (string_params['verb'] + '\n' 
                  + string_params['Content-Encoding'] + '\n'
                  + string_params['Content-Language'] + '\n'
                  + string_params['Content-Length'] + '\n'
                  + string_params['Content-MD5'] + '\n' 
                  + string_params['Content-Type'] + '\n' 
                  + string_params['Date'] + '\n' 
                  + string_params['If-Modified-Since'] + '\n'
                  + string_params['If-Match'] + '\n'
                  + string_params['If-None-Match'] + '\n'
                  + string_params['If-Unmodified-Since'] + '\n'
                  + string_params['Range'] + '\n'
                  + string_params['CanonicalizedHeaders']+'\n'
                  + string_params['CanonicalizedResource'])

signed_string = base64.b64encode(hmac.new(base64.b64decode(storage_account_key), msg=string_to_sign.encode('utf-8'), digestmod=hashlib.sha256).digest()).decode()
headers = {
    'x-ms-date' : request_time,
    'x-ms-version' : api_version,
    'Authorization' : ('SharedKey ' + storage_account_name + ':' + signed_string)
}
url = ('https://' + storage_account_name + '.dfs.core.windows.net/'+FILE_SYSTEM_NAME)
r = requests.get(url, headers = headers)

#print out the file content
print(r.text)

Test result:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM