简体   繁体   English

使用Eclipse在Dataflow上运行WordCount示例管道时出错

[英]Error when running the WordCount example pipeline on Dataflow with Eclipse

When trying to run the WordCount example pipeline using Dataflow under Eclipse IDE, I get the following error: 尝试在Eclipse IDE下使用Dataflow运行WordCount示例管道时,出现以下错误:

Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233)
    at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:162)
    at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
    at org.apache.beam.sdk.Pipeline.create(Pipeline.java:150)
    at com.google.cloud.dataflow.examples.WordCount.main(WordCount.java:178)

Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222)
    ... 4 more

Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://mysite-ga-datastreaming-196008-my-bucket/', did you mean: 'gs://some-bucket/mysite-ga-datastreaming-196008-my-bucket'?
    at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:383)
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPath(GcsPathValidator.java:77)
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.validateOutputFilePrefixSupported(GcsPathValidator.java:60)
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:246)
    ... 9 more

Some people suggest that the error is due to the Java version, as it seems that Beam doesn't work fine with Java 9. Anyway, I'm still using Java 8. On the other hand, some other people say that the error is caused because you have to provide a subfolder under your bucket as the storage location. 有人认为该错误是由于Java版本引起的,因为Beam不能在Java 9上正常工作。无论如何,我仍在使用Java8。另一方面,其他人则说该错误是原因是您必须在存储桶下提供一个子文件夹作为存储位置。 I've tried, but it still does not work. 我已经尝试过,但仍然无法正常工作。

If anyone faced this same issue before or can provide any advice on the error, it would be appreciated. 如果有人以前曾遇到过同样的问题,或者可以提供有关该错误的任何建议,我们将不胜感激。

在使用管道之前,您应该在Google Cloud Storage中创建存储桶gs://mysite-ga-datastreaming-196008-my-bucket/

Hi Mangu' suggestion is correct. 嗨,Mangu的建议是正确的。 You need to assign a folder instead of bucket name only for the cloud storage staging location. 您仅需要为云存储登台位置分配文件夹而不是存储桶名称。

Refer to my post here about all the details: link 有关所有详细信息,请参阅我的帖子: 链接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM