简体   繁体   English

Apache Beam / Google Dataflow-将数据从Google数据存储导出到Cloud Storage中的文件

[英]Apache Beam/Google Dataflow - Exporting Data from Google Datastore to File in Cloud Storage

I need create a file report for user request. 我需要为用户请求创建文件报告。 Each user select the filter for file report, and my application should generate a file in cloud storage and send a notification with the file link generated. 每个用户都选择文件报告过滤器,我的应用程序应该在云存储中生成一个文件,并发送带有生成的文件链接的通知。

This is the application workflow: 这是应用程序工作流程:

  1. the client selects a filter and request a report file 客户选择一个过滤器并请求一个报告文件
  2. The application get this request and create a record in datastore with data about user selected filter 应用程序收到此请求并在数据存储区中创建一条记录,其中包含有关用户选择的过滤器的数据
  3. Stores the Datastore key URL Safe String from the new record in pubsub. 将新记录中的数据存储区密钥URL安全字符串存储在pubsub中。
  4. The Dataflow Pipeline read the key stored in PubSub. 数据流管道读取存储在PubSub中的密钥。
  5. Generate file in google cloud storage 在Google云端存储中生成文件
  6. Notifies the client with storage file url 通知客户端存储文件的URL

It is possible to create a file for each pubsub entrance ? 是否可以为每个pubsub入口创建文件?

How I do to create a file with custom name? 如何创建具有自定义名称的文件?

It is correct this architecture ? 这种架构是正确的吗?

Your use case sounds as if it would be more applicable to google cloud storage than cloud datastore. 您的用例听起来似乎比云数据存储区更适用于Google云存储。 Google cloud storage is meant for opaque file-like blobs of data, and provides a method to receive pubsub notifications on file updates https://cloud.google.com/storage/docs/pubsub-notifications . Google云存储用于存储不透明的文件状数据,并提供一种方法来接收有关文件更新的pubsub通知https://cloud.google.com/storage/docs/pubsub-notifications

However, its a bit unclear why you're using the indirection of pubsub and datastore in this case. 但是,在这种情况下,为什么要使用pubsub和数据存储的间接寻址还不清楚。 Could the server handling the client request instead directly make a call to the google cloud storage api? 服务器可以处理客户端请求,而是直接调用Google云存储api吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Cloud Dataflow从PubSub将数据流式传输到Google Cloud Storage - Streaming data to Google Cloud Storage from PubSub using Cloud Dataflow 是否可以在 Apache 光束或谷歌云数据流中运行自定义 python 脚本 - Is it possible to run a custom python script in Apache beam or google cloud dataflow Google Cloud Dataflow ETL(数据存储 - >转换 - > BigQuery) - Google Cloud Dataflow ETL (Datastore -> Transform -> BigQuery) Google Cloud Dataflow 无法导入“google.cloud.datastore” - Google Cloud Dataflow can't import 'google.cloud.datastore' 从 Google Dataflow 保存到 Google Datastore - Saving to Google Datastore from Google Dataflow 从数据存储区将数据迁移到Google Cloud Firestore - Migrating data to Google Cloud Firestore from Datastore 使用Java中的Google Compute引擎与数据存储和云存储进行通信 - Communication with Datastore and cloud Storage from google Compute engine in Java 将App Engine数据存储中的数据导出为Google云端硬盘电子表格 - Exporting data from App Engine datastore as a Google Drive spreadsheet 将Google数据存储区备份从数据存储加载到Google BigQuery - Load Google Datastore Backups from Data Storage to Google BigQuery 将数据从Google数据存储(GAE)迁移到Google Cloud SQL - Migrating data from Google Datastore (GAE) to Google Cloud SQL
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM