![](/img/trans.png)
[英]Issues streaming data from Pub/Sub into BigQuery using Dataflow and Apache Beam (Python)
[英]Error while running beam streaming pipeline (Python) with pub/sub io in embedded Flinkrunner (apache_beam [GCP])
在 Flinkrunner 上的 Apache Beam 中運行流式管道(python)時,我遇到了以下錯誤。 該管道包含一個 GCP 發布/訂閱 io 源和發布/訂閱目標。
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.6 interpreter.
ERROR:root:java.lang.IllegalArgumentException: PCollectionNodes [PCollectionNode{id=ref_PCollection_PCollection_1, PCollection=unique_name: "23 Read from Pub/Sub/Read.None"
coder_id: "ref_Coder_BytesCoder_1"
is_bounded: UNBOUNDED
windowing_strategy_id: "ref_Windowing_Windowing_1"
}] were consumed but never produced
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "/usr/local/lib64/python3.6/site-packages/apache_beam/pipeline.py", line 586, in __exit__
self.result.wait_until_finish()
File "/usr/local/lib64/python3.6/site-packages/apache_beam/runners/portability/portable_runner.py", line 599, in wait_until_finish
raise self._runtime_exception
RuntimeError: Pipeline BeamApp-swarna0kpaul-0712135603-763999c_45da372e-757d-4690-8e25-1a5ed0a5cc84 failed in state FAILED: java.lang.IllegalArgumentException: PCollectionNodes [PCollectionNode{id=ref_PCollection_PCollection_1, PCollection=unique_name: "23 Read from Pub/Sub/Read.None"
coder_id: "ref_Coder_BytesCoder_1"
is_bounded: UNBOUNDED
windowing_strategy_id: "ref_Windowing_Windowing_1"
}] were consumed but never produced
我正在嘗試在 Python 中運行以下代碼 我正在嘗試使用我在 GCP 帳戶中創建的 2 個發布/訂閱主題({輸入主題},{輸出主題})運行主題采用這種格式 - 項目/{項目名稱}/topics/{主題名稱}
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
input_topic=<input topic>
output_topic=<output topic>
options = PipelineOptions(["--runner=FlinkRunner", "--checkpointing_interval=1000","--streaming"])
with beam.Pipeline(options=options ) as pipeline:
input1 = pipeline | " Read from Pub/Sub" >> beam.io.ReadFromPubSub(topic=input_topic).with_output_types(bytes)
output = (input1
|beam.WindowInto(beam.transforms.window.FixedWindows(5))
|"Write to Pub/Sub" >>beam.io.WriteToPubSub(topic=output_topic, with_attributes=False).with_input_types(bytes))
系統提供以下軟件版本
Python 3.6.8
apache_beam [gcp]==2.30.0
java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
我嘗試按照此頁面中的規范使用 flink 集群和便攜式 flink runner 運行它,但得到了同樣的錯誤。
當我使用以下選項時,相同的代碼運行良好
options = PipelineOptions(["--streaming"])
apache_beam.io.ReadFromPubsub()
轉換僅適用於 DirectRunner 和 Dataflow Runner,但您可以嘗試使用外部轉換: apache_beam.io.external.gcp.pubsub.ReadFromPubSub
,請參閱: https ://github.com/apache /beam/blob/release-2.39.0/sdks/python/apache_beam/io/external/gcp/pubsub.py#L39
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.