简体   繁体   English

带有数据流的 Apache Beam Go SDK

[英]Apache Beam Go SDK with Dataflow

I've been working with the Go Beam SDK (v2.13.0) and can't get the wordcount example working on GCP Dataflow.我一直在使用 Go Beam SDK (v2.13.0),但无法获得适用于 GCP Dataflow 的wordcount 示例 It enters crash loop trying to start the org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness .它进入崩溃循环,尝试启动org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness The example is executing correctly when run locally using the Direct runner.该示例在使用 Direct runner 本地运行时正确执行。

The example was completely unmodified from the original example given above.该示例与上面给出的原始示例完全相同。

The stack trace is:堆栈跟踪是:

org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.InvalidProtocolBufferException: Protocol message had invalid UTF-8. 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.InvalidProtocolBufferException.invalidUtf8(InvalidProtocolBufferException.java:148) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readStringRequireUtf8(CodedInputStream.java:2353) 
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec.<init>(RunnerApi.java:59611) 
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec.<init>(RunnerApi.java:59572) 
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec$1.parsePartialFrom(RunnerApi.java:60241) 
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec$1.parsePartialFrom(RunnerApi.java:60235) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder.<init>(RunnerApi.java:27531) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder.<init>(RunnerApi.java:27489) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$1.parsePartialFrom(RunnerApi.java:28410) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$1.parsePartialFrom(RunnerApi.java:28404) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$Builder.mergeFrom(RunnerApi.java:28028) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$Builder.mergeFrom(RunnerApi.java:27868) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2408) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntryLite.parseField(MapEntryLite.java:128) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntryLite.parseEntry(MapEntryLite.java:184) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry.<init>(MapEntry.java:106) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry.<init>(MapEntry.java:50) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:70) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:64) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Components.<init>(RunnerApi.java:930) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Components.<init>(RunnerApi.java:848) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2714) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2708) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.<init>(RunnerApi.java:2892) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.<init>(RunnerApi.java:2850) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:3981) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:3975) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:221) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:239) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:244) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) 
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:311) 
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.parseFrom(RunnerApi.java:3222) 
at org.apache.beam.runners.dataflow.worker.DataflowWorkerHarnessHelper.getPipelineFromEnv(DataflowWorkerHarnessHelper.java:131) 
at org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:59) 

I was using the docker image specified in the example and also tried from my own docker using the same tag (v2.13.0) but still get the same error.我正在使用示例中指定的 docker 映像,并且还使用相同的标签 (v2.13.0) 从我自己的 docker 中进行了尝试,但仍然出现相同的错误。 I Realize it's not production ready, but I am hoping the samples should work.我意识到它还没有准备好生产,但我希望样品应该可以工作。

As per the instructions on the getting starting I ran the job like this:根据入门的说明,我运行了这样的工作:

wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://example-bucket/counts \
--runner dataflow \
--project example-project \
--temp_location gs://example-bucket/tmp/ \
--staging_location gs://example-bucket/binaries/ \
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515

Again I tried that docker provided in the getting started, as well as one built using v2.13.0.我再次尝试了入门中提供的 docker,以及使用 v2.13.0 构建的 docker。

My go.mod for the sample file is:我的示例文件 go.mod 是:

module example.org/wordcount

go 1.12

require (
    cloud.google.com/go v0.41.0 // indirect
    github.com/apache/beam v2.13.0+incompatible
    github.com/pkg/errors v0.8.1 // indirect
    golang.org/x/net v0.0.0-20190628185345-da137c7871d7 // indirect
    google.golang.org/grpc v1.22.0 // indirect
)

What could be causing this?什么可能导致这种情况?

Dataflow doesn't officially support the Apache Beam Go SDK. Dataflow 不正式支持 Apache Beam Go SDK。 Some users have had been able to use it though.不过,有些用户已经能够使用它。 I suspect that this release may have had issues.我怀疑这个版本可能有问题。 You may be able to try a different version.您或许可以尝试不同的版本。

You may be able to discuss with other users on the Beam mailing list about which versions have worked for them (though, unsupported).您可以与Beam 邮件列表中的其他用户讨论哪些版本适用于他们(尽管不受支持)。

I had the same problem.我有同样的问题。 Increasing the Beam version to 2.19 worked for me.将 Beam 版本增加到 2.19 对我有用。

Looks like they rearranged some of the Beam Fn API protos in a backwards-incompatible way so you need to make sure you're on the newest version for the Beam Go SDK to work.看起来他们以向后不兼容的方式重新排列了一些 Beam Fn API 原型,因此您需要确保使用最新版本的 Beam Go SDK 才能正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM