[英]How to remove extra files when sinking CSV files to Azure Data Lake Gen2 with Azure Data Factory data flow?
I have done data flow tutorial.我已经完成了数据流教程。 Sink currently created 4 files to Azure Data Lake Gen2.
Sink 当前创建了 4 个文件到 Azure Data Lake Gen2。 I suppose this is related to HDFS file system.
我想这与 HDFS 文件系统有关。
Is it possible to save without success, committed, started files?是否可以保存未成功、已提交、已启动的文件?
What is best practice?什么是最佳实践? Should they be removed after saving to data lake gen2?
保存到数据湖 gen2 后是否应删除它们? Are then needed in further data processing?
那么还需要进一步的数据处理吗?
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow
There are a couple of options available.有几个选项可用。
You can mention the output filename in Sink transformation settings.您可以在 Sink 转换设置中提及 output 文件名。
Select Output to single file from the dropdown of file name option and give the output file name. Select Output 从文件名选项下拉到单个文件并给出 output 文件名。
You could also parameterize the output file name as required.您也可以根据需要参数化 output 文件名。 Refer to this SO thread.
请参阅此SO线程。
You can add delete activity after the data flow activity in the pipeline and delete the files from the folder.您可以在管道中的数据流活动之后添加删除活动,并从文件夹中删除文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.