简体   繁体   English

BigQuery 流式插入在 GKE 上失败

[英]BigQuery streaming inserts fail on GKE

We have GKE cluster with 3x n2-highcpu-8 nodes, web application written in GO scaled to 3 instances (1 per each node) that writes all requests using streaming to BigQuery, and I noticed quite weird behaviour:我们有具有 3 个 n2-highcpu-8 节点的 GKE 集群,用 GO 编写的 web 应用程序扩展到 3 个实例(每个节点 1 个),使用流式传输将所有请求写入 BigQuery,我注意到非常奇怪的行为:

During high application usage 2 out of 3 instances of application starts to 100% fail in streaming writes with writing only "context deadline exceeded” as an error, and when I delete those 2 pods and they are getting back to receiving traffic the old 1 starts failing with "context deadline exceeded”, while 1 of new 2's are successfully continue writing data and another starts failing.在高应用程序使用率期间,应用程序的 3 个实例中有 2 个在流式写入中开始 100% 失败,仅写入“超出上下文截止日期”作为错误,当我删除这 2 个 pod 并且它们恢复接收流量时,旧的 1 开始因“超出上下文期限”而失败,而新的 2 中的 1 成功继续写入数据,另一个开始失败。

I went through quotes and limits available on BigQuery documentation and haven't found anything that might be related to this case, looking into Stackdriver Monitoring to see how many writes there are per table per second and numbers around 1500, and the size of data that sent is also quite small 1-5kb, we don't use batch writes so it is mostly done through goroutines ASAP as request comes.我浏览了 BigQuery 文档中可用的报价和限制,但没有发现任何可能与此案例相关的内容,查看 Stackdriver Monitoring 以查看每张表每秒有多少写入以及 1500 左右的数字,以及数据的大小sent 也很小 1-5kb,我们不使用批量写入,所以它主要通过 goroutines 在请求到来时尽快完成。

BigQuery Logging don't have any errors/warnings as well. BigQuery Logging 也没有任何错误/警告。

Is there any hidden limitation, or overall BigQuery streaming writes only good for some small amount of simultaneous writes and we need some queue solution using Pub/Sub and Dataflow to transport data to BigQuery in high volume?是否有任何隐藏的限制,或者整体 BigQuery 流式写入仅适用于少量同时写入,我们需要一些使用 Pub/Sub 和 Dataflow 的队列解决方案将数据大量传输到 BigQuery?

GKE and BigQuery dataset is located in europe-west-2 and this happens every day GKE 和 BigQuery 数据集位于 europe-west-2,这种情况每天都在发生

[EDIT] [编辑]

Here is some streaming statistics from one of the biggest tables if it does make any difference:以下是来自最大表之一的一些流统计数据,如果它确实有任何区别:

Streaming buffer statistics Estimated size 249.57 MB Estimated rows 1,640,220 Earliest entry time 3 Dec 2020, 18:43:00流缓冲区统计 估计大小 249.57 MB 估计行 1,640,220 最早进入时间 2020 年 12 月 3 日,18:43:00

Actually the issue was related to misconfiguration of Affinity settings of the application, and 2 pods been running on same GKE node, which during primetime consumed 100% of cpu and that seems to be an related issue, so after that was sorted we haven't seen any context deadlines messages or errors with writing to BigQuery实际上这个问题与应用程序的 Affinity 设置的错误配置有关,并且 2 个 pod 在同一个 GKE 节点上运行,在黄金时段消耗了 100% 的 cpu,这似乎是一个相关的问题,所以在排序之后我们没有在写入 BigQuery 时看到任何上下文截止日期消息或错误

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM