简体   繁体   中英

Uploading folders on Azure Datalake storage failed using Azure Storage Explorer and python SDK both

I am trying to upload my on-premise data on the Azure Datalake storage, the data is about 10 GB in total and divided into multiple folders. I have tried multiple ways to upload the files, the size of each file varies from some KBs to 56MBs, and all are binary data files.

Firstly, I tried to upload them using the python SDK for azure datalake using the following function:

def upload_file_to_directory_bulk(filesystem_name,directory_name,fname_local,fname_uploaded): try:

    file_system_client = service_client.get_file_system_client(file_system=filesystem_name)

    directory_client = file_system_client.get_directory_client(directory_name)
    
    file_client = directory_client.get_file_client(fname_uploaded)

    local_file = open(fname_local,'r',encoding='latin-1')
    
    file_contents = local_file.read()
   
    file_client.upload_data(file_contents, length=len(file_contents),overwrite=True,validate_content=True)

except Exception as e:
  print(e)

The problem with this function is that either it skips the files from the local folder to upload, or some of the files uploaded do not have the same size as the local same local file.

The second method that I tried was by uploading the whole folder using Azure Storage Explorer, the storage explorer would crash/fail after uploading about 90 to 100 files. Is there any way I can see the logs and see the reason why it stopped?

Thirdly, I just manually uploaded using the Azure Portal, but that was a complete mess as it also failed on some files.

Can anyone guide me how to upload bulk data on the Azure data lake? And what could be the problem occurring in these 3 methods.

Uploading files using Azure Portal is easiest and reliable option. I'm not sure what exactly wrong you are doing assuming you have reliable inte.net.

I have uploaded around 2.67 GB of data carrying 691 files, and it got uploaded easily without any issue. Many files are 75+ MB size. Check shared image below.

在此处输入图像描述

If you can split your data into 4 group and then upload each group you can easily upload the files without any issue.

Another Approach

You can use AzCopy to upload the data.

AzCopy is a command-line utility that you can use to copy blobs or files to or from a storage account.

It can easily upload large files with some simple command-line commands.

Refer: Get started with AzCopy , Upload files to Azure Blob storage by using AzCopy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM