简体   繁体   中英

Running Google Dataflow with PubsubIO source for testing

I'm creating data-processing application using Google Cloud Dataflow - it is going to stream data from Pubsub to Bigquery .

I'm somewhat bewildered with infrastructure. I created my application prototype and can run it locally, using files (with TextIO ) for source and destination.

However if I change source to PubsubIO.Read.subscription(...) I fail with "java.lang.IllegalStateException: no evaluator registered for PubsubIO.Read" (I am not much surprised since I see no methods to pass authentication anyway).

But how am I supposed to run this? Should I create some virtual machine in Google Cloud Engine and deploy stuff there, or I am supposed to describe a job somehow and submit it to Dataflow API (without caring of any explicit VM-s?)

Could you please point me to some kind of step-by-step instruction on this topic - or rather explain the workflow shortly. I'm sorry for the question is probably silly.

You would need to run your pipeline on the Google Cloud infrastructure in order to access PubSub, see: https://cloud.google.com/dataflow/pipelines/specifying-exec-params#CloudExecution

From their page:

// Create and set your PipelineOptions.
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);

// For Cloud execution, set the Cloud Platform project, staging location,
// and specify DataflowPipelineRunner or BlockingDataflowPipelineRunner.
options.setProject("my-project-id");
options.setStagingLocation("gs://my-bucket/binaries");
options.setRunner(DataflowPipelineRunner.class);

// Create the Pipeline with the specified options.
Pipeline p = Pipeline.create(options);

// Specify all the pipeline reads, transforms, and writes.
...

// Run the pipeline.
p.run();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM