简体   繁体   English

在数据流托管服务中运行时数据流不读取 PubSub 消息

[英]Dataflow not reading PubSub messages when running in Dataflow Managed Service

Our python Dataflow pipeline works locally but not when deployed using the Dataflow managed service on Google Cloud Platform.我们的 python Dataflow 管道在本地运行,但在使用 Google Cloud Platform 上的 Dataflow 托管服务部署时无法运行。 It doesn't show signs that it is connected to the PubSub subscription.它没有显示连接到 PubSub 订阅的迹象。 We have tried subscribing to both subscription and topic, neither of them worked.我们尝试同时订阅订阅和主题,但都没有用。 The messages accumulate in the PubSub subscription and the Dataflow pipeline doesn't show signs of being called or anything.消息在 PubSub 订阅中累积,Dataflow 管道没有显示被调用或任何迹象。 We have double-checked the project is the same我们已经仔细检查了项目是否相同

Any directions on this would be very much appreciated对此的任何指示将不胜感激

Here is the code to connect to a pull subscription这是连接到请求订阅的代码

with beam.Pipeline(options=options) as p:
        something = p | "ReadPubSub" >> beam.io.ReadFromPubSub(
            subscription="projects/PROJECT_ID/subscriptions/cloudflow"
        )

Here goes the options used这是使用的选项

 options = PipelineOptions()
 file_processing_options = PipelineOptions().view_as(FileProcessingOptions)
 if options.view_as(GoogleCloudOptions).project is None:
        print(sys.argv[0] + ": error: argument --project is required")
        sys.exit(1)
 options.view_as(SetupOptions).save_main_session = True
 options.view_as(StandardOptions).streaming = True

The PubSub subscription has this configuration: PubSub 订阅具有以下配置:

Delivery type: Pull
Subscription expiration: Subscription expires in 31 days if there is no activity.
Acknowledgement deadline: 57 Seconds
Subscription filter: —
Message retention duration: 7 Days
Retained acknowledged messages: No
Dead lettering: Disabled
Retry policy : Retry immediately

Very late answer, it may still help someone else.答案很晚,它可能仍会帮助其他人。 I had the same problem, solved it like this:我有同样的问题,这样解决了:

  1. Thanks to user Paramnesia1 who wrote this answer, I figured out that I was not observing all the logs on Logs Explorer.感谢写这个答案的用户 Paramnesia1,我发现我没有观察日志资源管理器上的所有日志。 Some default job_name query filters were preventing me from that.一些默认的 job_name 查询过滤器阻止了我这样做。 I am quoting & claryfing the steps to follow to be able to see all logs:我引用并澄清了要遵循的步骤以便能够查看所有日志:

Open the Logs tab in the Dataflow Job UI, section Job Logs打开数据流作业 UI 中的日志选项卡,作业日志部分

Click the "View in Logs Explorer" button单击“在日志资源管理器中查看”按钮

In the new Logs Explorer screen, in your Query window, remove all the existing "logName" filters, keep only resource.type and resource.labels.job_id在新的日志资源管理器屏幕中,在您的查询 window 中,删除所有现有的“logName”过滤器,仅保留 resource.type 和 resource.labels.job_id

  1. Now you will be able to see all the logs and investigate further your error.现在您将能够看到所有日志并进一步调查您的错误。 In my case, I was getting some 'Syncing Pod' errors, which were due to importing the wrong data file in my setup.py.在我的例子中,我遇到了一些“Syncing Pod”错误,这是由于在我的 setup.py 中导入了错误的数据文件。

I think for Pulling from subscription we need to pass with_attributes parameter as True.我认为从订阅中提取我们需要将 with_attributes 参数作为 True 传递。

with_attributes – True - output elements will be PubsubMessage objects. with_attributes - True - output 元素将是 PubsubMessage 对象。 False - output elements will be of type bytes (message data only). False - output 元素将是字节类型(仅限消息数据)。

Found similar one here: When using Beam IO ReadFromPubSub module, can you pull messages with attributes in Python?在这里找到类似的: When using Beam IO ReadFromPubSub module, can you pull messages with attributes in Python? It's unclear if its supported 不清楚是否支持

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM