[英]Dataflow not reading PubSub messages when running in Dataflow Managed Service
Our python Dataflow pipeline works locally but not when deployed using the Dataflow managed service on Google Cloud Platform.我们的 python Dataflow 管道在本地运行,但在使用 Google Cloud Platform 上的 Dataflow 托管服务部署时无法运行。 It doesn't show signs that it is connected to the PubSub subscription.
它没有显示连接到 PubSub 订阅的迹象。 We have tried subscribing to both subscription and topic, neither of them worked.
我们尝试同时订阅订阅和主题,但都没有用。 The messages accumulate in the PubSub subscription and the Dataflow pipeline doesn't show signs of being called or anything.
消息在 PubSub 订阅中累积,Dataflow 管道没有显示被调用或任何迹象。 We have double-checked the project is the same
我们已经仔细检查了项目是否相同
Any directions on this would be very much appreciated对此的任何指示将不胜感激
Here is the code to connect to a pull subscription这是连接到请求订阅的代码
with beam.Pipeline(options=options) as p:
something = p | "ReadPubSub" >> beam.io.ReadFromPubSub(
subscription="projects/PROJECT_ID/subscriptions/cloudflow"
)
Here goes the options used这是使用的选项
options = PipelineOptions()
file_processing_options = PipelineOptions().view_as(FileProcessingOptions)
if options.view_as(GoogleCloudOptions).project is None:
print(sys.argv[0] + ": error: argument --project is required")
sys.exit(1)
options.view_as(SetupOptions).save_main_session = True
options.view_as(StandardOptions).streaming = True
The PubSub subscription has this configuration: PubSub 订阅具有以下配置:
Delivery type: Pull
Subscription expiration: Subscription expires in 31 days if there is no activity.
Acknowledgement deadline: 57 Seconds
Subscription filter: —
Message retention duration: 7 Days
Retained acknowledged messages: No
Dead lettering: Disabled
Retry policy : Retry immediately
Very late answer, it may still help someone else.答案很晚,它可能仍会帮助其他人。 I had the same problem, solved it like this:
我有同样的问题,这样解决了:
Open the Logs tab in the Dataflow Job UI, section Job Logs
打开数据流作业 UI 中的日志选项卡,作业日志部分
Click the "View in Logs Explorer" button
单击“在日志资源管理器中查看”按钮
In the new Logs Explorer screen, in your Query window, remove all the existing "logName" filters, keep only resource.type and resource.labels.job_id
在新的日志资源管理器屏幕中,在您的查询 window 中,删除所有现有的“logName”过滤器,仅保留 resource.type 和 resource.labels.job_id
I think for Pulling from subscription we need to pass with_attributes parameter as True.我认为从订阅中提取我们需要将 with_attributes 参数作为 True 传递。
with_attributes – True - output elements will be PubsubMessage objects. with_attributes - True - output 元素将是 PubsubMessage 对象。 False - output elements will be of type bytes (message data only).
False - output 元素将是字节类型(仅限消息数据)。
Found similar one here: When using Beam IO ReadFromPubSub module, can you pull messages with attributes in Python?在这里找到类似的: When using Beam IO ReadFromPubSub module, can you pull messages with attributes in Python? It's unclear if its supported
不清楚是否支持
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.