简体   繁体   English

Beanstalkd prot.c:710 in check_err: read(): Connection reset by peer - k8s 集群中的慢速套接字通信

[英]Beanstalkd prot.c:710 in check_err: read(): Connection reset by peer - slow socket communication in k8s cluster

In our kubernetes cluster we have 3 BE services (hosted on GCP).在我们的 kubernetes 集群中,我们有 3 个 BE 服务(托管在 GCP 上)。

We have 1 instance of Beanstalkd service & 1 ES cluster (3 master pods, 2 data pods, 2 client pods).我们有 1 个 Beanstalkd 服务实例和 1 个 ES 集群(3 个主 pod,2 个数据 pod,2 个客户端 pod)。 Each BE app has its own Redis instance and Mysql server.每个 BE 应用程序都有自己的 Redis 实例和 Mysql 服务器。

Now we built various health checks and we regularly see (like 5-10 times a day) connection speed issues.现在我们建立了各种健康检查,我们经常看到(比如每天 5-10 次)连接速度问题。 In the health check all we do is connect to the service and check how long this takes (via php) and if it takes too long (or doesn't connect at all) we raise a flag.在运行状况检查中,我们所做的只是连接到服务并检查这需要多长时间(通过 php),如果需要的时间太长(或根本没有连接),我们会提出一个标志。

We see results like:我们看到如下结果:

Test took 10.616209983826 sec at 2020-04-28 23:45:11.
Environment: production
Test took too long - more than 2 seconds.
Checked items and their timing:
pdo took 0.0043079853057861 seconds
beanstalkd took 10.036059141159 seconds
redis took 0.0028140544891357 seconds
ElasticSeearch took 0.57300901412964 seconds

Sometimes it is Redis taking this long, sometimes it is ES.有时是 Redis 花了这么长时间,有时是 ES。 But mostly it is Beanstalkd (up to 20 seconds.): This is a code sample how we check it:但主要是 Beanstalkd(最多 20 秒。):这是我们如何检查它的代码示例:

$startbeanstalkd = microtime(true);
// connect to Beanstalkd
try {
    $queue = new Beanstalk(
        [
            'host' => $config->beanstalkd->host,
            'port' => $config->beanstalkd->port
        ]
    );
    $queue->connect();
} catch (\Exception $e) {
    $testPassed = false;
    $testResult['Beanstalkd']['status'] = false;
    $testResult['Beanstalkd']['message'] = $e->getMessage();
    $testResult['Beanstalkd']['connectionDetails'] = json_encode($config->beanstalkd);
}
if($testPassed) $queue->disconnect();
$beantime = microtime(true);
$timetrack['beanstalkd'] = $beantime - $startbeanstalkd;

We noticed this error from beanstalkd regulary:我们经常从 beanstalkd 中注意到这个错误:

/usr/bin/beanstalkd: prot.c:710 in check_err: read(): Connection reset by peer

But searching on this hasn't really given us much info.但是对此进行搜索并没有真正为我们提供太多信息。

Note really relevant I think but we use Rancher 2 to manage our clusters.请注意我认为确实相关,但我们使用 Rancher 2 来管理我们的集群。

First I'd love to know what the beanstalkd error means and how to solve it首先,我很想知道 beanstalkd 错误是什么意思以及如何解决它

Then secondly would love to hear any suggestion on the general sockets time out / are slow-as-shit problem.其次,很想听听关于一般 sockets 超时/慢得像狗屎问题的任何建议。

Thank you!谢谢!

Problem was that we upgraded pheanstalk (php lib to veresion 4 from 3.x) to a new version and their documentation is/was bad.问题是我们将 pheanstalk(php lib 从 3.x 升级到版本 4)到新版本,并且他们的文档是/很糟糕。

A previous accepted connection method now wasn't accepted anymore but the method name was the same, just it now requires an argument but it doesn't give any error if you don't provide that argument.以前接受的连接方法现在不再被接受,但方法名称相同,只是它现在需要一个参数,但如果您不提供该参数,它不会给出任何错误。 Now if you don't give that argument you create zombie connections.现在,如果你不给出那个论点,你就会创建僵尸连接。 So we were racking up zombie connections eating up all sockets in whole cluster bringing everything to a stop.因此,我们建立了僵尸连接,吞噬了整个集群中的所有 sockets,使一切都停止了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM