简体   繁体   中英

Move only files that were read Google Cloud Data Fusion pipeline

Within a pipeline with executions in a limited time (30 minutes) that has as its source a GCS bucket and as a target BigQuery, after processing each file I want to move only the files that were executed in the pipeline, however in conditions and actions only GCS move is available, the difficulty is that it does not allow to discriminate the files in the source bucket and moves all the content which generates a loss of data when an execution starts after the first one takes more than 30 minutes.

Any ideas on how to approach this case?

my pipeline looks like this

The GCS Move plugin does not support filters, which would have helped I guees. There is an existing JIRA - https://cdap.atlassian.net/browse/PLUGIN-698 to track.

A workaround is to use File Move Plugin which has wildcard support.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM