[英]Move only files that were read Google Cloud Data Fusion pipeline
Within a pipeline with executions in a limited time (30 minutes) that has as its source a GCS bucket and as a target BigQuery, after processing each file I want to move only the files that were executed in the pipeline, however in conditions and actions only GCS move is available, the difficulty is that it does not allow to discriminate the files in the source bucket and moves all the content which generates a loss of data when an execution starts after the first one takes more than 30 minutes.在以 GCS 存储桶和目标 BigQuery 作为源的有限时间(30 分钟)内执行的管道中,在处理每个文件后,我只想移动在管道中执行的文件,但是在条件和操作中只有 GCS move 可用,困难在于它不允许区分源存储桶中的文件并移动所有在第一次执行超过 30 分钟后开始执行时会产生数据丢失的内容。
Any ideas on how to approach this case?关于如何处理此案的任何想法?
The GCS Move plugin does not support filters, which would have helped I guees. GCS Move 插件不支持过滤器,这对我有帮助。 There is an existing JIRA - https://cdap.atlassian.net/browse/PLUGIN-698 to track.
有一个现有的 JIRA - https://cdap.atlassian.net/browse/PLUGIN-698要跟踪。
A workaround is to use File Move Plugin which has wildcard support.一种解决方法是使用支持通配符的文件移动插件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.