简体   繁体   中英

Copy File/Folders in Azure Data Lake Gen1

In Azure Data Lake Storage Gen1 I can see the folder structure, See folders and files etc. I can preform actions on the files like renaming them/Deleting them and more

One operation that is missing in the Azure portal and in other means is the option to create a copy of a folder or a file

I have tried to do it using PowerShell and using the portal itself and it seems that this option is not available

Is there a reason for that?

Are there any other options to copy a folder in Data-lake?

The data-lake storage is used as part of an HDInsight cluster

You can use Azure Storage Explorer to copy files and folders.

  1. Open Storage Explorer.
  2. In the left pane, expand Local and Attached.
  3. Right-click Data Lake Store, and - from the context menu - select Connect to Data Lake Store....
  4. Enter the Uri, then the tool navigates to the location of the URL you just entered. 在此输入图像描述
  5. Select the file/folder you want to copy.
  6. Navigate to your desired destination.
  7. Click Paste. 在此输入图像描述

Other options for copying files and folders in a data lake include:

My suggestion is to use Azure Data Factory (ADF). It is the fastest way, if you want to copy large files or folders. Based on my experience 10GB files will be copied for approximately in 1 min 20 sec. You just need to create simple pipeline with one data store, which will be used as source and destination data store.

Using Azure Storage Explorer (ASE) for copy large files is to slow, 1GB more than 10 min. Copying files with ASE is the most similar operation as in most file explorer (Copy/Paste) unlike ADF copying which requires create pipeline. I think create simple pipeline is worth effort, especially because pipeline can be reused for copying another files or folders, with minimal editing.

I agree with the above comment, you can use ADF to copy the file. Just you need to look that it doesn't add up your costs. Microsoft Azure Storage Explorer (MASE) is also a good option to copy blob.

If you have very big files, then below option is more faster:

AzCopy:

Download a single file from blob to local directory:

AzCopy /Source:https://<StorageAccountName>.blob.core.windows.net/<BlobFolderName(if any)> /Dest:C:\ABC /SourceKey:<BlobAccessKey>  /Pattern:"<fileName>" 

If you are using the Azure Data Lake Store with HDInsight another very performant option is using the native hadoop file system commands like hdfs dfs -cp or if you want to copy a large number of files distcp. So for example:

hadoop distcp adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/sourcefolder adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/targetfolder

This is also a good option, if you are using multiple storage accounts. See also the documentation .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM