简体   繁体   English

mongodb 服务在说 serverStatus 非常慢后崩溃

[英]mongodb service crashes after saying serverStatus was very slow

I have a mongodb service on EC2.我在 EC2 上有一个 mongodb 服务。 It automatically crashes after sometime.一段时间后它会自动崩溃。 When I did systemctl status mongodb , it gave me the following output:当我执行systemctl status mongodb ,它给了我以下输出:

● mongodb.service - High-performance, schema-free document-oriented database Loaded: loaded (/etc/systemd/system/mongodb.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Fri 2017-02-17 02:00:00 UTC; ● mongodb.service - 高性能、无模式的面向文档的数据库 Loaded:已加载(/etc/systemd/system/mongodb.service;已启用;供应商预设:已启用) Active:自 2017 年星期五以来失败(结果:信号)- 02-17 02:00:00 UTC; 3h 37min ago Main PID: 1152 (code=killed, signal=KILL) 3 小时 37 分钟前主 PID:1152(代码 = 已终止,信号 = 已终止)

Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Main process exited, code=killed, status=9/KILL Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Unit entered failed state. Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Main process exited, code=killed, status=9/KILL Feb 17 02:00:00 ip-172-31 -29-240 systemd[1]: mongodb.service: 单元进入失败状态。 Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Failed with result 'signal'. 2 月 17 日 02:00:00 ip-172-31-29-240 systemd[1]:mongodb.service:因结果“信号”而失败。 Warning: Journal has been rotated since unit was started.警告:自单元启动以来,日志已被轮换。 Log output is incomplete or unavailable.日志输出不完整或不可用。

When I checked the /var/log/mongodb/mongod.log file, I got this:当我检查/var/log/mongodb/mongod.log文件时,我得到了这个:

2017-02-20T11:11:05.624+0000 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] MongoDB starting : pid=29177 port=27017 dbpath=/var/lib/mongodb 64-bit host=ip-172-31-29-240
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] db version v3.2.11
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] git version: 009580ad490190ba33d1c6253ebd8d91808923e4
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] modules: none
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] build environment:
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1", port: 27017 }, storage: { dbPath: "/var/lib/mongodb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log", quiet: true } }
2017-02-20T11:11:05.665+0000 I -        [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-02-20T11:11:05.665+0000 W -        [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty.
2017-02-20T11:11:05.665+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-02-20T11:11:05.665+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] 
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] 
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] 
2017-02-20T11:11:06.280+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/var/lib/mongodb/diagnostic.data'
2017-02-20T11:11:06.280+0000 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2017-02-20T11:11:06.283+0000 I NETWORK  [initandlisten] waiting for connections on port 27017
2017-02-20T11:11:28.189+0000 I COMMAND  [conn3] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { myFlag: true, timestamp: { $lt: new Date(1485907200000) } } }, { $group: { _id: null, startDate: { $min: "$timestamp" }, previousCount: { $sum: 1 } } }, { $project: { _id: 0, startDate: 1, previousCount: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:236 reslen:114 locks:{ Global: { acquireCount: { r: 478 } }, Database: { acquireCount: { r: 239 } }, Collection: { acquireCount: { r: 239 } } } protocol:op_query 2605ms
2017-02-20T11:11:28.192+0000 I COMMAND  [conn8] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { myFlag: true, timestamp: { $lt: new Date(1485907200000) } } }, { $group: { _id: null, startDate: { $min: "$timestamp" }, previousCount: { $sum: 1 } } }, { $project: { _id: 0, startDate: 1, previousCount: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:244 reslen:114 locks:{ Global: { acquireCount: { r: 494 } }, Database: { acquireCount: { r: 247 } }, Collection: { acquireCount: { r: 247 } } } protocol:op_query 2764ms
2017-02-20T11:11:28.235+0000 I COMMAND  [conn5] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910685417) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:244 reslen:538 locks:{ Global: { acquireCount: { r: 494 } }, Database: { acquireCount: { r: 247 } }, Collection: { acquireCount: { r: 247 } } } protocol:op_query 2817ms
2017-02-20T11:11:28.303+0000 I COMMAND  [conn8] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910685581) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 110ms
2017-02-20T11:11:28.438+0000 I COMMAND  [conn3] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910688306) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 129ms
2017-02-20T11:11:28.444+0000 I COMMAND  [conn5] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910688272) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 154ms
2017-02-20T11:12:15.504+0000 I COMMAND  [conn1] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910735374) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:143 reslen:538 locks:{ Global: { acquireCount: { r: 292 } }, Database: { acquireCount: { r: 146 } }, Collection: { acquireCount: { r: 146 } } } protocol:op_query 128ms
2017-02-20T11:12:15.582+0000 I COMMAND  [conn5] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910735439) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:145 reslen:538 locks:{ Global: { acquireCount: { r: 296 } }, Database: { acquireCount: { r: 148 } }, Collection: { acquireCount: { r: 148 } } } protocol:op_query 140ms
2017-02-20T11:12:15.588+0000 I COMMAND  [conn8] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910735428) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 155ms
2017-02-20T11:12:24.765+0000 I COMMAND  [conn3] query myDataBase.myCollection query: { $query: {}, orderby: { timestamp: -1 } } planSummary: COLLSCAN, COLLSCAN cursorid:47222766812 ntoreturn:200 ntoskip:0 keysExamined:0 docsExamined:18423 hasSortStage:1 keyUpdates:0 writeConflicts:0 numYields:145 nreturned:37 reslen:4578802 locks:{ Global: { acquireCount: { r: 292 } }, Database: { acquireCount: { r: 146 } }, Collection: { acquireCount: { r: 146 } } } 177ms
2017-02-20T11:12:25.268+0000 I COMMAND  [conn3] getmore myDataBase.myCollection planSummary: COLLSCAN, COLLSCAN cursorid:47222766812 ntoreturn:200 keysExamined:0 docsExamined:18423 keyUpdates:0 writeConflicts:0 numYields:151 nreturned:200 reslen:2664936 locks:{ Global: { acquireCount: { r: 304 } }, Database: { acquireCount: { r: 152 } }, Collection: { acquireCount: { r: 152 } } } 321ms
2017-02-20T11:12:25.518+0000 I COMMAND  [conn3] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { myFlag: true, timestamp: { $lt: new Date(1485907200000) } } }, { $group: { _id: null, startDate: { $min: "$timestamp" }, previousCount: { $sum: 1 } } }, { $project: { _id: 0, startDate: 1, previousCount: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:114 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 173ms
2017-02-20T11:12:25.691+0000 I COMMAND  [conn3] command myDataBase.compressionstats command: aggregate { aggregate: "compressionstats", pipeline: [ { $group: { _id: null, originalDataSum: { $sum: "$originalDataSize" }, compressedDataSum: { $sum: "$compressedDataSize" } } } ] } keyUpdates:0 writeConflicts:0 numYields:2 reslen:125 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 5 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_query 127ms
2017-02-20T11:12:26.525+0000 I WRITE    [conn3] update myDataBase.sessions query: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH" } update: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH", session: "{"cookie":{"originalMaxAge":null,"expires":null,"httpOnly":true,"path":"/"},"passport":{"user":"587cc6630bd9ff5c39e54e94"}}", expires: new Date(1488798745083) } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:1 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 601ms
2017-02-20T11:12:26.532+0000 I COMMAND  [conn2] command secondDB.secondCollection command: getMore { getMore: 16340341641, collection: "secondCollection", batchSize: 1000 } planSummary: COLLSCAN cursorid:16340341641 keysExamined:0 docsExamined:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:0 reslen:103 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 1372ms
2017-02-20T11:12:26.570+0000 I COMMAND  [conn3] command myDataBase.$cmd command: update { update: "sessions", writeConcern: { w: 1 }, ordered: true, updates: [ { q: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH" }, u: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH", session: "{"cookie":{"originalMaxAge":null,"expires":null,"httpOnly":true,"path":"/"},"passport":{"user":"587cc6630bd9ff5c39e54e94"}}", expires: new Date(1488798745083) }, multi: false, upsert: true } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 702ms
2017-02-20T11:12:26.905+0000 I COMMAND  [conn3] query myDataBase.users query: { _id: ObjectId('587cc6630bd9ff5c39e54e94') } planSummary: IDHACK ntoskip:0 keysExamined:1 docsExamined:1 idhack:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:524 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } 204ms
2017-02-20T11:12:27.126+0000 I COMMAND  [conn3] killcursors myDataBase.myCollection keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 86ms
2017-02-20T11:12:33.002+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 120, after asserts: 170, after connections: 370, after extra_info: 580, after globalLock: 690, after locks: 990, after network: 1070, after opcounters: 1170, after opcountersRepl: 1240, after storageEngine: 1680, after tcmalloc: 1910, after wiredTiger: 2390, at end: 2820 }

NOTE : The service status output and log are from different days, however, the issue was same, I just didn't copy the service status output this time.注意:服务状态输出和日志来自不同的日子,但是,问题是一样的,我这次没有复制服务状态输出。

I had the same problem.我有同样的问题。 After increasing memory the error disappear.增加内存后,错误消失。

The root cause of the problem could be some slow queries running on the mongo.问题的根本原因可能是在 mongo 上运行的一些缓慢的查询。

so to find out the slow queries, follow these steps:所以要找出慢查询,请按照下列步骤操作:

  1. Login to the mongo console.登录 mongo 控制台。

  2. Use your db.使用你的数据库。

  3. Find queries or operations that are slow using the following command:使用以下命令查找缓慢的查询或操作:

    db.currentOp({"secs_running": {$gte: 3}})

    So, this will give a result of queries which are taking more than 3 seconds to execute.因此,这将给出执行时间超过 3 秒的查询结果。 The output could be like this:输出可能是这样的:

     rs:SECONDARY> db.currentOp({"secs_running": {$gte: 3}}); { "inprog" : [ { "desc" : "tttt", "threadId" : "140558301292288", "connectionId" : 307551, "client" : "101.01.01.11:49128", "appName" : "MongoDB Shell", "clientMetadata" : { "application" : { "name" : "MongoDB Shell" }, "driver" : { "name" : "MongoDB Internal Client", "version" : "3.0.5-18-g7e327a9" }, "os" : { "type" : "Darwin", "name" : "Mac OS X", "architecture" : "x86_64", "version" : "19.3.0" } }, "active" : true, "opid" : 912509900, "secs_running" : 7711, "op" : "query", "ns" : "mydb.mytable", "query" : { "find" : "mytable", "filter" : { "name" : { "$type" : 3 } } ] }, "planSummary" : "COLLSCAN", "numYields" : 333308, "locks" : { "Global" : "r", "Database" : "r", "Collection" : "r" }, "waitingForLock" : false, "lockStats" : { "Global" : { "acquireCount" : { "r" : NumberLong(666618) }, "acquireWaitCount" : { "r" : NumberLong(254219) }, "timeAcquiringMicros" : { "r" : NumberLong("3454835364") } }, "Database" : { "acquireCount" : { "r" : NumberLong(333309) } }, "Collection" : { "acquireCount" : { "r" : NumberLong(333309) } } } } ], "ok" : 1

    } }

  4. So now you can kill these operations which are taking long time, by selecting the "opid" from above list as follows:所以现在你可以通过从上面的列表中选择“opid”来终止这些需要很长时间的操作,如下所示:

     db.killOp(912509900)
  5. Hope fully the overall system load will come down.完全希望整个系统负载会下降。 And you will not be experiencing "serverStatus was very slow" errors again.而且您不会再次遇到“serverStatus 非常慢”错误。

  6. And you System should not crash as well你的系统也不应该崩溃

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM