MongoDB的findandmodify队列非常慢

Question

I am using MongoDB as a queue and PHP-Queue as a way to get the data. 我使用MongoDB作为队列，使用PHP-Queue作为获取数据的方法。 This is a POC and I am running on an OSX machine. 这是一个POC，我正在OSX机器上运行。 I am seeing very slow performance from Mongo, namely the findmodify function. 我发现Mongo的性能非常慢，即findmodify函数。 I did a bunch of testing on the PHP end and PHP processing only accounts for about 5% of the time. 我在PHP端进行了大量测试，而PHP处理仅占大约5％的时间。 When I fill the Mongo collection, with say, 10,000 messages, it fills very quickly, on the order of 3-5 seconds. 当我用10,000条消息填充Mongo集合时，它会很快填充，大约需要3-5秒。 But when I empty it, it takes about 250 seconds . 但是，当我清空它时，大约需要250秒 。 Only about 10 seconds of this time is on the php side. 这个时间只有大约10秒在php端。 Checking the mongod process, it never goes over about 60MB but the CPU spikes for over 90% the entire time. 检查mongod进程后，它永远不会超过60MB，但CPU的峰值持续时间超过了90％。 I have indexed the collection, and below is a sample of the message data as well as the index. 我已经索引了该集合，下面是消息数据以及索引的示例。

Example message(this is one of 10,000 similar messages in the queue): 消息示例（这是队列中10,000个类似消息之一）：

{
  "_id": ObjectId("526c47d5c5008c1d5cd63ef8"),
  "payload": {
    "0": {
      "EVENT_HEADER_KEY": NumberInt(9094775),
      "event_name": "Account Change",
      "source_name": "Work",
      "event_category_name": "Complex Events",
      "EVENT_TIMESTAMP": "Aug 17 2013 12:00:00:000AM",
      "PARENT_HEADER_KEY": null,
      "year": NumberInt(2013),
      "month": NumberInt(10),
      "Company_Name": "ACME PRODUCTS, INC.",
      "Company_Email": "blabla",
      "Company_Phone": "555-555-5555",
      "First_Name": "Jon",
      "Last_Name": "Doe",
      "ID_NUMBER": "111111111",
      "created_by": "Load Job Name",
      "created_at": "Oct 18 2013 04:07:31:140PM",
      "product_analytical_category": "blabla",
      "_Event_Type": "blabla",
      "CUSTOMER_ID": "111111111"
   }
 },
  "running": false,
  "resetTimestamp": ISODate("2038-01-19T03:14:07.0Z"),
  "earliestGet": ISODate("1970-01-01T00:00:00.0Z"),
  "priority": 0,
  "created": ISODate("2013-10-26T22:53:09.440Z")
}

Index of this collection, which seems to have been created automatically: 该集合的索引，它似乎是自动创建的：

{
   "_id": NumberInt(1)
}

Checking the mongo.log, I can see that as I empty the queue, it only takes about 1 millisecond per message for about 70 messages, then the opid will change, and then there will be a 300-900 ms delay then it continues with the new opid at the same pace of about 1 millisecond per message. 检查mongo.log，我可以看到当我清空队列时，每条消息只花费大约1毫秒的时间即可发送大约70条消息，然后opid将改变，然后会有300-900毫秒的延迟，然后继续每条消息大约以1毫秒的速度移动新的opid。 These opid changes account for about 50-100 seconds of the 250 seconds processing time, so there is still more going on. 这些奇怪的变化占250秒处理时间的50-100秒左右，因此还有更多的事情要做。

Excerpt from the mongo.log: 摘自mongo.log：

**Sat Oct 26 15:15:25.189** [conn4] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: test.abe top: { **opid: 20064**, active: true, secs_running: 0, op: "query", ns: "test", query: { findandmodify: "abe", query: { running: false, earliestGet: { $lte: new Date(1382825725143) } }, update: { $set: { resetTimestamp: new Date(1382825785000), running: true } }, fields: { payload: 1 }, sort: { priority: 1, created: 1 } }, client: "127.0.0.1:53045", desc: "conn4", threadId: "0x119024000", connectionId: 4, locks: { ^: "w", ^test: "W" }, waitingForLock: false, numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { r: 0, w: 3 } } }

**Sat Oct 26 15:15:25.190** [conn4] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: test.abe top: { **opid: 20064**, active: true, secs_running: 0, op: "query", ns: "test", query: { findandmodify: "abe", query: { running: false, earliestGet: { $lte: new Date(1382825725143) } }, update: { $set: { resetTimestamp: new Date(1382825785000), running: true } }, fields: { payload: 1 }, sort: { priority: 1, created: 1 } }, client: "127.0.0.1:53045", desc: "conn4", threadId: "0x119024000", connectionId: 4, locks: { ^: "w", ^test: "W" }, waitingForLock: false, numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { r: 0, w: 3 } } }

**Sat Oct 26 15:15:25.507** [conn4] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: test.abe top: { **opid: 20141**, active: true, secs_running: 0, op: "query", ns: "test", query: { findandmodify: "abe", query: { running: false, earliestGet: { $lte: new Date(1382825725501) } }, update: { $set: { resetTimestamp: new Date(1382825785000), running: true } }, fields: { payload: 1 }, sort: { priority: 1, created: 1 } }, client: "127.0.0.1:53045", desc: "conn4", threadId: "0x119024000", connectionId: 4, locks: { ^: "w", ^test: "W" }, waitingForLock: false, numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { r: 0, w: 3 } } }

This is basically the same throughout the entire log of these 10,000 messages. 在这10,000条消息的整个日志中，这基本上是相同的。 There will be a long sequence of findandmodify() which only take 1 ms per message, then the opid changes and there is a delay which can take almost a second. 将有很长的findandmodify（）序列，每条消息仅花费1毫秒，然后opid发生变化，并且延迟可能会花费将近一秒钟。 I have no idea if this indicates anything important, but I am new to Mongo and I am trying to find any patterns that look promising. 我不知道这是否表明有任何重要意义，但是我是Mongo的新手，我正在尝试寻找任何看似有希望的模式。

UPDATE: 更新：

The query checks that the field 'running' is false and it also checks that the earliestGet field is more recent than the epoch (ie 1-1-1970). 该查询检查字段“ running”是否为假，并且还检查最早的Get字段是否比历元（即1-1-1970）更新。 I added indexes to these fields to no avail. 我将索引添加到这些字段都无济于事。 Since these fields are the same for all messages (which are 'false' and Jan 1, 1970) in the collection, maybe that is why my indexing of them only increases the query time. 由于这些字段对于集合中的所有消息（“ false”和1970年1月1日）都是相同的，因此也许这就是为什么我为它们建立索引只会增加查询时间的原因。 I have no idea what I should do to get this to work properly. 我不知道该怎么做才能正常工作。 It seems that it should grab the first record it finds that is newer than Jan 1, 1970 but apparently Mongo still goes through the whole collection, which makes the query too slow to be practical. 似乎它应该抓住它发现的第一条记录，该记录是比1970年1月1日更新的，但是显然Mongo仍然遍历整个集合，这使查询太慢而无法实用。 Furthermore, even when I have NO selection criteria I still get response time of 202 seconds - faster, but still unacceptable. 此外，即使我没有选择标准，我仍然可以获得202秒的响应时间-更快，但仍然不能接受。 I also still see those "yield can't unlock b/c of recursive lock ns:" messages which I thought would only show up when querying un-indexed fields. 我还仍然看到那些“收益率无法解锁递归锁ns的b / c：”消息，我认为这些消息只会在查询未索引字段时显示。

Answer 1

You are missing a very critical index which would be used for the query and sort portion of the findAndModify command. 您缺少一个非常关键的索引，该索引将用于findAndModify命令的查询和排序部分。 Without that index, you are forcing each command to scan the entire collection and then sort the full result set which is inefficient. 如果没有该索引，则将强制每个命令扫描整个集合，然后对效率低下的完整结果集进行排序。 Now you said "I have indexed the collection" but you only mentioned '_id' index which is always present and cannot help you. 现在，您说“我已对该集合建立索引”，但您只提到了“ _id”索引，该索引始终存在并且无法帮助您。

Recommendation: add a compound index on fields running and earliestGet at a minimum. 建议：在运行和最早获取的字段上至少添加一个复合索引。 It's possible that having an index include the sort fields may help, but since I would expect the number of matching documents for each query would be relatively small, the in memory sort may be less of a factor. 索引中包含排序字段可能会有所帮助，但是由于我希望每个查询的匹配文档数会相对较少，因此内存中排序的影响可能较小。

Command: 命令：

db.abe.ensureIndex({running:1, earliestGet:1})

It turned out in the comments discussion that the running, earliestGet index was not at all selective - but since you are sorting to only fetch the first matching document, the alternative is to add an index on the sort columns: 在评论讨论中发现，运行中的，最早的Get索引根本不是选择性的-但由于您的排序是仅获取第一个匹配的文档，因此可以选择在sort列上添加索引：

Command: 命令：

db.abe.ensureIndex({ priority: 1, created: 1 })

Answer 2

Without more detailed description what exactly are you doing during modify phase it is hard to give definitive answer. 如果没有更详细的描述，您在修改阶段到底要做什么，很难给出确切的答案。 Judging from the log it seems you perform updates like this: 从日志来看，您似乎在执行以下更新：

db.abc.findAndModify(
    query: { running: false, earliestGet: { $lte: new Date(1382825725143) } },
    update: { $set: { resetTimestamp: new Date(1382825785000), running: true } }
)

and there is no index on earliestGet field and running field. 并且没有对earliestGet字段和running字段进行索引。 Because of low cardinality adding index on running shouldn't make a real difference, but lack of index on the earliestGet could be a real problem. 由于基数较低，因此在running添加索引不会产生真正的变化，但earliestGet的Get上缺少索引可能是一个实际问题。

About warning: ClientCursor::yield can't unlock b/c of recursive lock ns: message you can see this question: MongoDB: Geting "Client Cursor::yield can't unlock b/c of recursive lock" warning when use findAndModify in two process instances 关于warning: ClientCursor::yield can't unlock b/c of recursive lock ns:消息，您可以看到以下问题： MongoDB：使用findAndModify时收到“ Client Cursor :: yield无法解锁递归锁的b / c”警告在两个流程实例中

MongoDB的findandmodify队列非常慢

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-10-26 23:51:32

解决方案2
1 2013-10-26 23:52:03

MongoDB的findandmodify队列非常慢

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-10-26 23:51:32

解决方案2 1 2013-10-26 23:52:03

解决方案1
4 已采纳 2013-10-26 23:51:32

解决方案2
1 2013-10-26 23:52:03