简体   繁体   English

GCP 中的物联网管道

[英]IoT pipeline in GCP

I have an IoT Pipeline in GCP that is structured like:我在 GCP 中有一个 IoT 管道,其结构如下:

IoT Core -> Pub/Sub -> Dataflow -> BigQuery

I am using esp32 devices to send data with new data being sent every 2 seconds.我正在使用 esp32 设备发送数据,每 2 秒发送一次新数据。 For now I am testing with only 4 devices but ultimatly the project will consist of hundreds of esp32 devices each sending data after every 2 seconds.目前我只用 4 台设备进行测试,但最终该项目将包含数百个 esp32 设备,每个设备每 2 秒发送一次数据。 The issue is that even with 4 devices the unacked message count in Subscription goes up to 1260 messages.问题是即使有 4 台设备,订阅中未确认的消息数也会达到 1260 条。 Even though these messages are not lost they are simply just delayed, this could end up causing issues when I have to use hundreds of devices.即使这些消息没有丢失,它们只是被延迟了,当我不得不使用数百个设备时,这最终可能会导致问题。 So I need to alter my pipeline such that the data can be stored successfully without such a delay.所以我需要改变我的管道,以便可以成功存储数据而不会出现这样的延迟。 The data sent is in csv format.发送的数据为 csv 格式。 It is converted to JSON in Dataflow using a Javascript UDF, then uploaded to Bigquery using the google defined templates: Pub/Sub to BigQuery .它使用 Javascript UDF 在 Dataflow 中转换为 JSON,然后使用 google 定义的模板上传到 Bigquery: Pub/Sub to BigQuery All devices are using the same Pub/Sub topic and subscription.所有设备都使用相同的 Pub/Sub 主题和订阅。 Data from all devices is uploaded into the same BigQuery table.所有设备的数据都上传到同一个 BigQuery 表中。 If it helps then it is also possible to store data somewhere else like in Cloud Storage (if that is faster) first and then upload all the data to BigQuery later on (after every hour or something) but ultimately I require all my data to be inside BigQuery.如果有帮助,那么也可以先将数据存储在 Cloud Storage 等其他地方(如果这样更快),然后稍后(每隔一小时或其他时间)将所有数据上传到 BigQuery,但最终我需要所有数据在 BigQuery 中。 Please suggest the how I can improve my pipeline.请建议我如何改进我的管道。

This error was being caused because after every 10 seconds the pub/sub resent the messages that had not yet been acknowledged.之所以会出现此错误,是因为发布/订阅每 10 秒后会重新发送尚未确认的消息。 This caused the total number of messages to grow rapidly as the number of devices sending the messages and the rate at which they sent them was already very high.这导致消息总数迅速增长,因为发送消息的设备数量和发送消息的速率已经非常高。 So I increased this wait time to 30 seconds and the system calmed down.所以我把这个等待时间增加到 30 秒,系统就平静下来了。 Now there is no large group of unacknowledged messages forming when I run the pipeline.现在,当我运行管道时,没有形成大量未确认的消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM