简体   繁体   English

使用mgo驱动程序,Mongo Connection Count每10秒爬升一次

[英]Mongo Connection Count creeping up one per 10 second with mgo driver

We monitor our mongoDB connection count using this: 我们使用以下方法监视我们的mongoDB连接数:

http://godoc.org/labix.org/v2/mgo#GetStats http://godoc.org/labix.org/v2/mgo#GetStats

However, we have been facing a strange connection leak issue where the connectionCount creeps up consistently by 1 more open connection per 10 seconds. 但是,我们遇到了一个奇怪的连接泄漏问题,其中connectionCount每10秒一次更多地打开连接。 (That's regardless whether there is any requests). (无论是否有任何要求,都是如此)。 I can spin up a server in localhost, leave it there, do nothing, the conectionCount will still creep up. 我可以在localhost中启动服务器,将其保留在那里,什么也不做,conectionCount仍然会爬升。 Connection count eventually creeps up to a few thousand and it kills the app/db then and we have to restart the app. 连接数最终会增加到几千,然后它会杀死app / db,我们必须重新启动应用程序。

This might not be enough information for you to debug. 这可能不足以让您进行调试。 Does anyone have any ideas, connection leaks that you have dealt with in the past. 有没有人有任何想法,你过去处理过的连接泄漏。 How did you debug it? 你是怎么调试的? What are some of the way that I can debug this. 有什么方法可以调试它。

We have tried a few things, we scanned our code base for any code that could open a connection and put counters/debugging statements there, and so far we have found no leak. 我们已经尝试了一些东西,我们扫描了我们的代码库,找到了可以打开连接并在其中放置计数器/调试语句的任何代码,到目前为止我们没有发现泄漏。 It is almost like there is a leak in a library somewhere. 这几乎就像某个地方的图书馆有泄漏。

This is a bug in a branch that we have been working on and there have been a few hundred commits into it. 这是我们一直在研究的分支中的一个错误,并且已经有几百个提交。 We have done a diff between this and master and couldn't find why there is a connection leak in this branch. 我们在这个和master之间做了一个差异,无法找到为什么这个分支中存在连接泄漏。

As an example, there is the dataset that I am referencing: 例如,我引用了数据集:

Clusters:      1   
MasterConns:   9936      <-- creeps up 1 per second
SlaveConns:    -7359     <-- why is this negative?
SentOps:       42091780   
ReceivedOps:   38684525   
ReceivedDocs:  39466143   
SocketsAlive:  78        <-- what is the difference between the socket count and the master conns count?
SocketsInUse:  1231   
SocketRefs:    1231

MasterConns is the number that creeps up one per 10 second. MasterConns是每10秒爬升一次的数字。 I am not entirely sure what the other numbers can mean. 我不完全确定其他数字是什么意思。

MasterConns cannot tell you whether there's a leak or not, because it does not decrease. MasterConns不能告诉你是否有泄漏,因为它没有减少。 The field indicates the number of connections made since the last statistics reset, not the number of sockets that are currently in use. 该字段指示自上次统计信息重置以来所做的连接数,而不是当前正在使用的套接字数。 The latter is indicated by the SocketsAlive field. 后者由SocketsAlive字段指示。

To give you some additional relief on the subject, every single test in the mgo suite is wrapped around logic that ensures that statistics show sane values after the test finishes, so that potential leaks don't go unnoticed. 为了让您对这个主题有一些额外的解脱,mgo套件中的每一个测试都围绕着逻辑,确保统计数据在测试结束后显示出合理的值,这样潜在的泄漏就不会被忽视。 That's the main reason why such statistics collection system was introduced. 这就是引入这种统计数据收集系统的主要原因。

Then, the reason why you see this number increasing every 10 seconds or so is due to the internal activity that happens to learn the status of the cluster. 然后,您看到此数字每10秒钟左右增加的原因是由于学习群集状态时发生的内部活动。 That said, this behavior was recently changed so that it doesn't establish new connections and instead picks existent sockets from the pool, so I believe you're not using the latest release. 也就是说,这种行为最近发生了变化,因此它没有建立新的连接,而是从池中选择现有的套接字,所以我相信你没有使用最新版本。

Having SlaveConns negative looks like a bug. SlaveConns负面看起来像一个bug。 There's a small edge case about statistics collection for connections made, because we cannot tell whether a given server is a master or a slave before we've talked to it, so there might be an uncovered path. 关于连接的统计信息收集有一个小问题,因为在我们与之交谈之前,我们无法判断给定服务器是主服务器还是从服务器,因此可能存在未覆盖的路径。 If you still see that behavior after you upgrade, please report the issue and I'll be happy to look at it. 如果您在升级后仍然看到该行为,请报告此问题,我将很乐意看到它。

SocketsInUse is the number of sockets that are still being referenced by one or more sessions, whether they are alive (the connection is established) or not. SocketsInUse是一个或多个会话仍在引用的套接字数,无论它们是否处于活动状态(已建立连接)。 SocketsAlive is, again, the real number of live TCP connections. SocketsAlive再次是实时TCP连接的实际数量。 The delta between the two indicates that a number of sessions were not closed. 两者之间的差异表示许多会话未关闭。 This may be okay, if they are still being held in memory by the application and will eventually be closed, or it may be a leak if a session.Close operation was missed by the application. 这可能没问题,如果它们仍被应用程序保存在内存中并最终将被关闭,或者如果应用程序错过了session.Close操作,则可能是泄漏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM