简体繁体 English

仅移动已读取 Google Cloud Data Fusion 管道的文件

[英]Move only files that were read Google Cloud Data Fusion pipeline

原文 2022-09-22 13:57:46 6 1 google-cloud-data-fusion/ cdap

Within a pipeline with executions in a limited time (30 minutes) that has as its source a GCS bucket and as a target BigQuery, after processing each file I want to move only the files that were executed in the pipeline, however in conditions and actions only GCS move is available, the difficulty is that it does not allow to discriminate the files in the source bucket and moves all the content which generates a loss of data when an execution starts after the first one takes more than 30 minutes.在以 GCS 存储桶和目标 BigQuery 作为源的有限时间（30 分钟）内执行的管道中，在处理每个文件后，我只想移动在管道中执行的文件，但是在条件和操作中只有 GCS move 可用，困难在于它不允许区分源存储桶中的文件并移动所有在第一次执行超过 30 分钟后开始执行时会产生数据丢失的内容。

Any ideas on how to approach this case?关于如何处理此案的任何想法？

my pipeline looks like this我的管道看起来像这样

1 个解决方案

The GCS Move plugin does not support filters, which would have helped I guees. GCS Move 插件不支持过滤器，这对我有帮助。 There is an existing JIRA - https://cdap.atlassian.net/browse/PLUGIN-698 to track.有一个现有的 JIRA - https://cdap.atlassian.net/browse/PLUGIN-698要跟踪。

A workaround is to use File Move Plugin which has wildcard support.一种解决方法是使用支持通配符的文件移动插件。

云数据融合管道中的 SCD 类型 2 实现 - SCD type 2 implementation in Cloud Data Fusion Pipeline

从云数据融合管道连接到谷歌云 sql 用于 postgres 的问题 - Issue connecting to google cloud sql for postgres from cloud data fusion pipeline

Elasticsearch 到 BigQuery 管道部署在云数据融合实例上失败 - Elasticsearch to BigQuery pipeline deployment fails on cloud data fusion instance

无法将 Cloud Data Fusion 与 Google Cloud SQL 连接为 PostgreSQL - Can't connect Cloud Data Fusion with Google Cloud SQL for PostgreSQL

Gogle Cloud Data Fusion Post GresQL 从关注者数据库中读取 - Gogle Cloud Data Fusion Post GresQL read from follower database

实时 Google 数据融合管道是否支持 HTTP 插件？ - Is the HTTP Plugin supported in Real-time Google Data Fusion pipeline?

使用 Google Cloud Data Fusion 执行自定义 SQL 查询 - Perform custom SQL query with Google Cloud Data Fusion

云数据融合与 Dataproc - Cloud Data Fusion vs Dataproc

存储数据从谷歌云迁移到 Azure - Storage data move from Google Cloud to Azure

Google Cloud Spanner 通过 Cloud Data Fusion 或其他方式实时更改数据捕获到 PubSub/Kafka - Google Cloud Spanner real time Change Data Capture to PubSub/Kafka through Cloud Data Fusion or Others

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 云数据融合管道中的 SCD 类型 2 实现 - SCD type 2 implementation in Cloud Data Fusion Pipeline 从云数据融合管道连接到谷歌云 sql 用于 postgres 的问题 - Issue connecting to google cloud sql for postgres from cloud data fusion pipeline Elasticsearch 到 BigQuery 管道部署在云数据融合实例上失败 - Elasticsearch to BigQuery pipeline deployment fails on cloud data fusion instance 无法将 Cloud Data Fusion 与 Google Cloud SQL 连接为 PostgreSQL - Can't connect Cloud Data Fusion with Google Cloud SQL for PostgreSQL Gogle Cloud Data Fusion Post GresQL 从关注者数据库中读取 - Gogle Cloud Data Fusion Post GresQL read from follower database 实时 Google 数据融合管道是否支持 HTTP 插件？ - Is the HTTP Plugin supported in Real-time Google Data Fusion pipeline? 使用 Google Cloud Data Fusion 执行自定义 SQL 查询 - Perform custom SQL query with Google Cloud Data Fusion 云数据融合与 Dataproc - Cloud Data Fusion vs Dataproc 存储数据从谷歌云迁移到 Azure - Storage data move from Google Cloud to Azure Google Cloud Spanner 通过 Cloud Data Fusion 或其他方式实时更改数据捕获到 PubSub/Kafka - Google Cloud Spanner real time Change Data Capture to PubSub/Kafka through Cloud Data Fusion or Others

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM