[英]Copy File/Folders in Azure Data Lake Gen1
In Azure Data Lake Storage Gen1 I can see the folder structure, See folders and files etc. I can preform actions on the files like renaming them/Deleting them and more 在Azure Data Lake Storage Gen1中,我可以看到文件夹结构,查看文件夹和文件等。我可以对文件执行操作,例如重命名/删除它们等等
One operation that is missing in the Azure portal and in other means is the option to create a copy of a folder or a file Azure门户和其他方法中缺少的一项操作是创建文件夹或文件副本的选项
I have tried to do it using PowerShell and using the portal itself and it seems that this option is not available 我曾尝试使用PowerShell并使用门户本身,似乎此选项不可用
Is there a reason for that? 这有什么理由吗?
Are there any other options to copy a folder in Data-lake? 是否还有其他选项可以复制Data-lake中的文件夹?
The data-lake storage is used as part of an HDInsight cluster 数据湖存储用作HDInsight群集的一部分
You can use Azure Storage Explorer to copy files and folders. 您可以使用Azure存储资源管理器复制文件和文件夹。
Other options for copying files and folders in a data lake include: 在数据湖中复制文件和文件夹的其他选项包括:
My suggestion is to use Azure Data Factory (ADF). 我的建议是使用Azure Data Factory(ADF)。 It is the fastest way, if you want to copy large files or folders.
如果要复制大文件或文件夹,这是最快的方法。 Based on my experience 10GB files will be copied for approximately in 1 min 20 sec.
根据我的经验,10GB文件大约会在1分20秒内被复制。 You just need to create simple pipeline with one data store, which will be used as source and destination data store.
您只需要创建一个包含一个数据存储的简单管道,该数据存储将用作源和目标数据存储。
Using Azure Storage Explorer (ASE) for copy large files is to slow, 1GB more than 10 min. 使用Azure存储资源管理器(ASE)复制大文件的速度要慢1GB,超过10分钟。 Copying files with ASE is the most similar operation as in most file explorer (Copy/Paste) unlike ADF copying which requires create pipeline.
使用ASE复制文件与大多数文件浏览器(复制/粘贴)中的操作最相似,这与需要创建管道的ADF复制不同。 I think create simple pipeline is worth effort, especially because pipeline can be reused for copying another files or folders, with minimal editing.
我认为创建简单的管道是值得的,特别是因为管道可以重复用于复制其他文件或文件夹,只需要很少的编辑。
I agree with the above comment, you can use ADF to copy the file. 我同意上述评论,您可以使用ADF复制文件。 Just you need to look that it doesn't add up your costs.
只是你需要看它不会增加你的成本。 Microsoft Azure Storage Explorer (MASE) is also a good option to copy blob.
Microsoft Azure存储资源管理器(MASE)也是复制blob的好选择。
If you have very big files, then below option is more faster: 如果你有非常大的文件,那么下面的选项更快:
AzCopy: AzCopy:
Download a single file from blob to local directory: 从blob下载单个文件到本地目录:
AzCopy /Source:https://<StorageAccountName>.blob.core.windows.net/<BlobFolderName(if any)> /Dest:C:\ABC /SourceKey:<BlobAccessKey> /Pattern:"<fileName>"
If you are using the Azure Data Lake Store with HDInsight another very performant option is using the native hadoop file system commands like hdfs dfs -cp or if you want to copy a large number of files distcp. 如果您正在使用带有HDInsight的Azure Data Lake Store,则另一个非常高性能的选项是使用本机hadoop文件系统命令,如hdfs dfs -cp,或者如果要复制大量文件distcp。 So for example:
例如:
hadoop distcp adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/sourcefolder adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/targetfolder
This is also a good option, if you are using multiple storage accounts. 如果您使用多个存储帐户,这也是一个不错的选择。 See also the documentation .
另请参阅文档 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.