简体   繁体   English

复制Azure Data Lake Gen1中的文件/文件夹

[英]Copy File/Folders in Azure Data Lake Gen1

In Azure Data Lake Storage Gen1 I can see the folder structure, See folders and files etc. I can preform actions on the files like renaming them/Deleting them and more 在Azure Data Lake Storage Gen1中,我可以看到文件夹结构,查看文件夹和文件等。我可以对文件执行操作,例如重命名/删除它们等等

One operation that is missing in the Azure portal and in other means is the option to create a copy of a folder or a file Azure门户和其他方法中缺少的一项操作是创建文件夹或文件副本的选项

I have tried to do it using PowerShell and using the portal itself and it seems that this option is not available 我曾尝试使用PowerShell并使用门户本身,似乎此选项不可用

Is there a reason for that? 这有什么理由吗?

Are there any other options to copy a folder in Data-lake? 是否还有其他选项可以复制Data-lake中的文件夹?

The data-lake storage is used as part of an HDInsight cluster 数据湖存储用作HDInsight群集的一部分

You can use Azure Storage Explorer to copy files and folders. 您可以使用Azure存储资源管理器复制文件和文件夹。

  1. Open Storage Explorer. Open Storage Explorer。
  2. In the left pane, expand Local and Attached. 在左侧窗格中,展开“本地”和“附加”。
  3. Right-click Data Lake Store, and - from the context menu - select Connect to Data Lake Store.... 右键单击Data Lake Store,然后从上下文菜单中选择Connect to Data Lake Store ....
  4. Enter the Uri, then the tool navigates to the location of the URL you just entered. 输入Uri,然后该工具导航到您刚输入的URL的位置。 在此输入图像描述
  5. Select the file/folder you want to copy. 选择要复制的文件/文件夹。
  6. Navigate to your desired destination. 导航到您想要的目的地。
  7. Click Paste. 单击粘贴。 在此输入图像描述

Other options for copying files and folders in a data lake include: 在数据湖中复制文件和文件夹的其他选项包括:

My suggestion is to use Azure Data Factory (ADF). 我的建议是使用Azure Data Factory(ADF)。 It is the fastest way, if you want to copy large files or folders. 如果要复制大文件或文件夹,这是最快的方法。 Based on my experience 10GB files will be copied for approximately in 1 min 20 sec. 根据我的经验,10GB文件大约会在1分20秒内被复制。 You just need to create simple pipeline with one data store, which will be used as source and destination data store. 您只需要创建一个包含一个数据存储的简单管道,该数据存储将用作源和目标数据存储。

Using Azure Storage Explorer (ASE) for copy large files is to slow, 1GB more than 10 min. 使用Azure存储资源管理器(ASE)复制大文件的速度要慢1GB,超过10分钟。 Copying files with ASE is the most similar operation as in most file explorer (Copy/Paste) unlike ADF copying which requires create pipeline. 使用ASE复制文件与大多数文件浏览器(复制/粘贴)中的操作最相似,这与需要创建管道的ADF复制不同。 I think create simple pipeline is worth effort, especially because pipeline can be reused for copying another files or folders, with minimal editing. 我认为创建简单的管道是值得的,特别是因为管道可以重复用于复制其他文件或文件夹,只需要很少的编辑。

I agree with the above comment, you can use ADF to copy the file. 我同意上述评论,您可以使用ADF复制文件。 Just you need to look that it doesn't add up your costs. 只是你需要看它不会增加你的成本。 Microsoft Azure Storage Explorer (MASE) is also a good option to copy blob. Microsoft Azure存储资源管理器(MASE)也是复制blob的好选择。

If you have very big files, then below option is more faster: 如果你有非常大的文件,那么下面的选项更快:

AzCopy: AzCopy:

Download a single file from blob to local directory: 从blob下载单个文件到本地目录:

AzCopy /Source:https://<StorageAccountName>.blob.core.windows.net/<BlobFolderName(if any)> /Dest:C:\ABC /SourceKey:<BlobAccessKey>  /Pattern:"<fileName>" 

If you are using the Azure Data Lake Store with HDInsight another very performant option is using the native hadoop file system commands like hdfs dfs -cp or if you want to copy a large number of files distcp. 如果您正在使用带有HDInsight的Azure Data Lake Store,则另一个非常高性能的选项是使用本机hadoop文件系统命令,如hdfs dfs -cp,或者如果要复制大量文件distcp。 So for example: 例如:

hadoop distcp adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/sourcefolder adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/targetfolder

This is also a good option, if you are using multiple storage accounts. 如果您使用多个存储帐户,这也是一个不错的选择。 See also the documentation . 另请参阅文档

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure 数据工厂 - Azure 数据湖 Gen1 访问 - Azure Data Factory - Azure Data Lake Gen1 access 如何使用 java sdk 在 azure 数据湖 gen1 中创建资源? - How to create resources in azure data lake gen1 with java sdk? 如何重命名 Azure Data Lake Storage Gen1 帐户名称? - How to rename Azure Data Lake Storage Gen1 account name? 如何使用Jenkins上传到Azure Data Lake Storage Gen1? - How to use Jenkins to upload to Azure Data Lake Storage Gen1? 如何通过 Azure Data Lake Store gen1 中的新文件触发 Azure Data Factory v2 或 Azure Databricks Notebook 中的管道 - How to trigger a pipeline in Azure Data Factory v2 or a Azure Databricks Notebook by a new file in Azure Data Lake Store gen1 在不将文件移动到Azure Databricks文件系统中的情况下,最简单,最好的方法是在Azure数据湖Gen1中解压缩文件吗? - What is the easy and best method to unzip the files in Azure data lake Gen1 without moving the files to Azure Databricks file system? 在没有 Azure DataFactory 的情况下将文件和文件夹从 Azure DataLake Gen1 复制到 Azure DataLake Gen2 - Copy files and folders from Azure DataLake Gen1 to Azure DataLake Gen2 without Azure DataFactory 将不同类型的文件从 Gen1 Azur 湖复制到具有属性的 Azur Gen2 湖(如上次更新) - Copy Different type of file from Gen1 Azur lake to Azur Gen2 lake with attribute( like last updated) 如何将文件从多个源文件夹复制到 azure 数据湖存储第 2 代中的目标文件夹 - How to copy files from multiple source folders to target folders in azure data lake storage gen 2 从 Azure EventHubs Capture 生成的 Azure Data Lake Gen1 使用 Databricks 读取 avro 数据失败 - Reading avro data with Databricks from Azure Data Lake Gen1 generated by Azure EventHubs Capture fails
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM