简体   繁体   English

使用PubsubIO源运行Google Dataflow进行测试

[英]Running Google Dataflow with PubsubIO source for testing

I'm creating data-processing application using Google Cloud Dataflow - it is going to stream data from Pubsub to Bigquery . 我正在使用Google Cloud Dataflow创建数据处理应用程序-它会将数据从Pubsub流传输到Bigquery

I'm somewhat bewildered with infrastructure. 我对基础架构有些困惑。 I created my application prototype and can run it locally, using files (with TextIO ) for source and destination. 我创建了我的应用程序原型,并可以使用文件(带有TextIO )作为源和目标在本地运行它。

However if I change source to PubsubIO.Read.subscription(...) I fail with "java.lang.IllegalStateException: no evaluator registered for PubsubIO.Read" (I am not much surprised since I see no methods to pass authentication anyway). 但是,如果我将源更改为PubsubIO.Read.subscription(...)我将失败,并出现“ java.lang.IllegalStateException:没有为PubsubIO.Read注册评估器”(我并不感到惊讶,因为我仍然看不到任何通过身份验证的方法) 。

But how am I supposed to run this? 但是我该怎么办呢? Should I create some virtual machine in Google Cloud Engine and deploy stuff there, or I am supposed to describe a job somehow and submit it to Dataflow API (without caring of any explicit VM-s?) 我应该在Google Cloud Engine创建一些虚拟机并在其中部署东西,还是应该以某种方式描述一项job然后将其提交给Dataflow API (不需要任何显式的VM-s?)

Could you please point me to some kind of step-by-step instruction on this topic - or rather explain the workflow shortly. 您能否指点我一些有关此主题的分步说明-或更简短地解释工作流程。 I'm sorry for the question is probably silly. 我很抱歉这个问题可能很愚蠢。

You would need to run your pipeline on the Google Cloud infrastructure in order to access PubSub, see: https://cloud.google.com/dataflow/pipelines/specifying-exec-params#CloudExecution 您需要在Google Cloud基础结构上运行管道才能访问PubSub,请参阅: https ://cloud.google.com/dataflow/pipelines/specifying-exec-params#CloudExecution

From their page: 从他们的页面:

// Create and set your PipelineOptions.
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);

// For Cloud execution, set the Cloud Platform project, staging location,
// and specify DataflowPipelineRunner or BlockingDataflowPipelineRunner.
options.setProject("my-project-id");
options.setStagingLocation("gs://my-bucket/binaries");
options.setRunner(DataflowPipelineRunner.class);

// Create the Pipeline with the specified options.
Pipeline p = Pipeline.create(options);

// Specify all the pipeline reads, transforms, and writes.
...

// Run the pipeline.
p.run();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM