简体繁体 English

使用 Azure 数据工厂数据流将 CSV 文件下沉到 Azure Data Lake Gen2 时如何删除额外文件？

[英]How to remove extra files when sinking CSV files to Azure Data Lake Gen2 with Azure Data Factory data flow?

原文 2021-09-24 08:31:18 8 1 azure-data-factory

I have done data flow tutorial.我已经完成了数据流教程。 Sink currently created 4 files to Azure Data Lake Gen2. Sink 当前创建了 4 个文件到 Azure Data Lake Gen2。 I suppose this is related to HDFS file system.我想这与 HDFS 文件系统有关。

Is it possible to save without success, committed, started files?是否可以保存未成功、已提交、已启动的文件？

What is best practice?什么是最佳实践？ Should they be removed after saving to data lake gen2?保存到数据湖 gen2 后是否应删除它们？ Are then needed in further data processing?那么还需要进一步的数据处理吗？

https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow

1 个解决方案

There are a couple of options available.有几个选项可用。

You can mention the output filename in Sink transformation settings.您可以在 Sink 转换设置中提及 output 文件名。
- Select Output to single file from the dropdown of file name option and give the output file name. Select Output 从文件名选项下拉到单个文件并给出 output 文件名。
- You could also parameterize the output file name as required.您也可以根据需要参数化 output 文件名。 Refer to this SO thread.请参阅此SO线程。
You can add delete activity after the data flow activity in the pipeline and delete the files from the folder.您可以在管道中的数据流活动之后添加删除活动，并从文件夹中删除文件。

如何使用 dbt 将镶木地板文件从 Azure Data Lake Gen2/Azure Blob 存储加载到专用池？ - How to load parquet files from Azure Data Lake Gen2/Azure Blob Storage to Dedicated pool using dbt?

使用 Databricks /mnt 安装 Azure Data lake Gen2 - Mounting Azure Data lake Gen2 with Databricks /mnt

使用 Elastic Stack 对驻留在 Azure Data Lake Storage Gen2 中的数据进行实时数据分析 - Realtime data analytics using Elastic Stack on data residing in Azure Data Lake Storage Gen2

发送 Azure Iot 数据到 azure gen2 - Send Azure Iot data to azure gen2

将数据从本地 sql 服务器复制到 Azure Data Lake Storage Gen2 中的增量格式 - copy data from on premise sql server to delta format in Azure Data Lake Storage Gen2

Azure Data Lake Gen2 存储帐户 blob 与 adf 选择 - Azure Data Lake Gen2 Storage Account blob vs adf choice

获取列表中数据湖 gen2 文件夹的所有内容 azure 突触工作区 - get all the contents of data lake gen2 folder in a list azure synapse workspace

用于解析 Azure Data Lake Storage Gen2 URI 的正则表达式，用于使用 Azurite 进行生产和测试 - Regex to parse Azure Data Lake Storage Gen2 URI for production and testing with Azurite

使用 azure 数据工厂中的数据流组合列形成多个 csv 文件 - Combine columns form multiple csv files using data flow in azure data factory

无法使用 python azure-storage-file-datalake SDK 在 Azure Data Lake Gen2 中创建 Append Blob - Cannot create Append Blobs in Azure Data Lake Gen2 using python azure-storage-file-datalake SDK

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 dbt 将镶木地板文件从 Azure Data Lake Gen2/Azure Blob 存储加载到专用池？ - How to load parquet files from Azure Data Lake Gen2/Azure Blob Storage to Dedicated pool using dbt? 使用 Databricks /mnt 安装 Azure Data lake Gen2 - Mounting Azure Data lake Gen2 with Databricks /mnt 使用 Elastic Stack 对驻留在 Azure Data Lake Storage Gen2 中的数据进行实时数据分析 - Realtime data analytics using Elastic Stack on data residing in Azure Data Lake Storage Gen2 发送 Azure Iot 数据到 azure gen2 - Send Azure Iot data to azure gen2 将数据从本地 sql 服务器复制到 Azure Data Lake Storage Gen2 中的增量格式 - copy data from on premise sql server to delta format in Azure Data Lake Storage Gen2 Azure Data Lake Gen2 存储帐户 blob 与 adf 选择 - Azure Data Lake Gen2 Storage Account blob vs adf choice 获取列表中数据湖 gen2 文件夹的所有内容 azure 突触工作区 - get all the contents of data lake gen2 folder in a list azure synapse workspace 用于解析 Azure Data Lake Storage Gen2 URI 的正则表达式，用于使用 Azurite 进行生产和测试 - Regex to parse Azure Data Lake Storage Gen2 URI for production and testing with Azurite 使用 azure 数据工厂中的数据流组合列形成多个 csv 文件 - Combine columns form multiple csv files using data flow in azure data factory 无法使用 python azure-storage-file-datalake SDK 在 Azure Data Lake Gen2 中创建 Append Blob - Cannot create Append Blobs in Azure Data Lake Gen2 using python azure-storage-file-datalake SDK

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM