简体   繁体   中英

Unzip .gz files from azure data lake using python

I am trying to unzip a .gz file stored in azure data lake.

from azure.datalake.store import core, lib

Tenant_Id = '####'
Client_Key = '####'
Client_Id = '####' 
token = lib.auth(tenant_id=Tenant_Id, client_secret=Client_Key, client_id=Client_Id)

store_name = 'root'
# Connecting to adl
adl = core.AzureDLFileSystem(token, store_name=store_name)
# List of .gz files 
list_of_gz_files = adl.ls('/test/2018')
# Would like to uzip files present inside list_of_gz_files list

Is it possible to unzip them using gzip etc?

Provide 3 options here to decompress zip files in the ADL.

1.Use Azure Data Factory to unzip the files using the copy file activity (native support for gzip files).

在此处输入图片说明

2.Use Custom Activity in ADF. Create job in azure batch and access data lake to unzip the file with python code.(Use gzip package)

3.Use custom extractor in U-SQL,please refer to this trace: How to preprocess and decompress .gz file on Azure Data Lake store?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM