简体   繁体   English

部署到 Digital Ocean 的 Meteor 应用程序卡在 100% CPU 和 OOM

[英]Meteor app deployed to Digital Ocean stuck at 100% CPU and OOM

I have a Meteor (0.8.0) app deployed using Meteor Up to Digital Ocean that's been stuck at 100% CPU, only to crash with out of memory, and start up again at 100% CPU.我有一个使用 Meteor Up to Digital Ocean 部署的 Meteor (0.8.0) 应用程序,它一直卡在 100% CPU 上,只会因内存不足而崩溃,然后以 100% CPU 重新启动。 It's been stuck like this for the past 24 hours.在过去的 24 小时内,它一直像这样卡住。 The weird part is nobody is using the server and meteor.log isn't showing much clues.奇怪的是没有人在使用服务器,而meteor.log 没有显示太多线索。 I've got MongoHQ with oplog for the database.我为数据库提供了带有 oplog 的 MongoHQ。

Digital Ocean specs:数字海洋规格:

1GB Ram 30GB SSD Disk New York 2 Ubuntu 12.04.3 x64 1GB 内存 30GB SSD 磁盘纽约 2 Ubuntu 12.04.3 x64

Screenshot showing issue:显示问题的屏幕截图:

在此处输入图片说明

Note that the screenshot was captured yesterday and it has stayed pegged at 100% cpu until it crashes with out of memory.请注意,屏幕截图是昨天捕获的,并且一直保持在 100% cpu,直到它因内存不足而崩溃。 The log shows:日志显示:

FATAL ERROR: Evacuation Allocation failed - process out of memory error: Forever detected script was killed by signal: SIGABRT error: Forever restarting script for 5 time致命错误:疏散分配失败 - 进程内存不足错误:永远检测到的脚本被信号杀死:SIGABRT 错误:永远重新启动脚本 5 次

Top displays:顶部显示:

26308 meteorus 20 0 1573m 644m 4200 R 98.1 64.7 32:45.36 node 26308 流星 20 0 1573m 644m 4200 R 98.1 64.7 32:45.36 节点

How it started: I have an app that takes in a list of emails via csv or mailchimp oauth, sends them off to fullcontact via their batch process call http://www.fullcontact.com/developer/docs/batch/ and then updates the Meteor collections accordingly depending on the response status.它是如何开始的:我有一个应用程序,它通过 csv 或 mailchimp oauth 接收电子邮件列表,通过他们的批处理调用将它们发送给 fullcontact http://www.fullcontact.com/developer/docs/batch/然后更新Meteor 会根据响应状态进行相应的收集。 A snippet from a 200 response来自 200 响应的片段

if (result.statusCode === 200) {
            var data = JSON.parse(result.content);
            var rate_limit = result.headers['x-rate-limit-limit'];
            var rate_limit_remaining = result.headers['x-rate-limit-remaining'];
            var rate_limit_reset = result.headers['x-rate-limit-reset'];
            console.log(rate_limit);
            console.log(rate_limit_remaining);
            console.log(rate_limit_reset);
            _.each(data.responses, function(resp, key) {
                var email = key.split('=')[1];
                if (resp.status === 200) {
                    var sel = {
                        email: email,
                        listId: listId
                    };
                    Profiles.upsert({
                        email: email,
                        listId: listId
                    }, {
                        $set: sel
                    }, function(err, result) {
                        if (!err) {
                            console.log("Upsert ", result);
                            fullContactSave(resp, email, listId, Meteor.userId());                            
                        }
                    });
                    RawCsv.update({
                        email: email,
                        listId: listId
                    }, {
                        $set: {
                            processed: true,
                            status: 200,
                            updated_at: new Date().getTime()
                        }
                    }, {
                        multi: true
                    });
                }
                });
                }

Locally on my wimpy Windows laptop running Vagrant, I have no performance issues whatsoever processing hundreds of thousands of emails at a time.在本地运行 Vagrant 的 Windows 笔记本电脑上,我一次处理数十万封电子邮件没有任何性能问题。 But on Digital Ocean, it can't even handle 15,000 it seems (I've seen the CPU spike to 100% and then crash with OOM, but after it comes up it usually stabalizes... not this time).但是在 Digital Ocean 上,它似乎甚至无法处理 15,000(我已经看到 CPU 飙升至 100%,然后因 OOM 而崩溃,但在它出现后通常会稳定下来……这次不是)。 What worries me is that the server hasn't recovered at all despite no/little activity on the app.让我担心的是,尽管应用程序上没有/很少活动,但服务器根本没有恢复。 I've verified this by looking at analytics - GA shows 9 sessions total over the 24 hours doing little more than hitting / and bouncing, MixPanel shows only 1 logged in user (me) in the same timeframe.我已经通过查看分析验证了这一点 - GA 在 24 小时内总共显示了 9 个会话,所做的只是点击 / 和弹跳,MixPanel 在同一时间范围内仅显示 1 个登录用户(我)。 And the only thing I've done since the initial failure is check the facts package, which shows:自从最初失败以来,我所做的唯一一件事就是检查facts包,其中显示:

mongo-livedata observe-multiplexers 13 observe-drivers-oplog 13 mongo-livedata 观察多路复用器 13 观察驱动程序 13

oplog-watchers 16 observe-handles 15 time-spent-in-QUERYING-phase oplog-watchers 16 个观察句柄 15 个时间花费在查询阶段

87828 time-spent-in-FETCHING-phase 82 livedata 87828 time-spent-in-fetching-phase 82 livedata

invalidation-crossbar-listeners 16 subscriptions 11 sessions 1 invalidation-crossbar-listeners 16 个订阅 11 个会话 1

Meteor APM also doesn't show anything out of the ordinary, the meteor.log doesn't show any meteor activity aside from the OOM and restart messages. Meteor APM 也没有显示任何异常,meteor.log 除了 OOM 和重启消息之外没有显示任何流星活动。 MongoHQ isn't reporting any slow running queries or much activity - 0 queries, updates, inserts, deletes on avg from staring at their monitoring dashboard. MongoHQ 没有报告任何运行缓慢的查询或大量活动 - 平均 0 次查询、更新、插入、删除从盯着他们的监控仪表板看。 So as far as I can tell, there hasn't been much activity for 24 hours, and certainly not anything intensive.据我所知,24 小时内没有太多活动,当然也没有任何密集活动。 I've since tried to install newrelic and nodetime but neither is quite working - newrelic shows no data and the meteor.log has a nodetime debug message从那以后,我尝试安装 newrelic 和 nodetime,但都不是很有效 - newrelic 没有显示任何数据,并且meteor.log 有一个 nodetime 调试消息

Failed loaded nodetime-native extention.加载 nodetime-native 扩展失败。

So when I try to use nodetime's CPU profiler it turns up blank and the heap snapshot returns with Error: V8 tools are not loaded.因此,当我尝试使用 nodetime 的 CPU 分析器时,它变为空白并且堆快照返回错误:未加载 V8 工具。

I'm basically out of ideas at this point, and since Node is pretty new to me it feels like I'm taking wild stabs in the dark here.在这一点上,我基本上没有想法,而且由于 Node 对我来说还很陌生,所以感觉就像我在这里的黑暗中进行了疯狂的刺杀。 Please help.请帮忙。

Update : Server is still pegged at 100% four days later.更新:四天后服务器仍保持在 100%。 Even an init 6 doesn't do anything - Server restarts, node process starts and jumps back up to 100% cpu.即使 init 6 也不做任何事情 - 服务器重新启动,节点进程启动并跳回到 100% cpu。 I tried other tools like memwatch and webkit-devtools-agent but could not get them to work with Meteor.我尝试了其他工具,如 memwatch 和 webkit-devtools-agent,但无法让它们与 Meteor 一起工作。

The following is the strace output以下是strace输出

strace -c -p 6840 strace -c -p 6840

Process 6840 attached - interrupt to quit附加进程 6840 - 中断退出

^CProcess 6840 detached ^CProcess 6840 分离

% time seconds usecs/call calls errors syscall % time seconds usecs/call 调用错误系统调用


77.17 0.073108 1 113701 epoll_wait 77.17 0.073108 1 113701 epoll_wait

11.15 0.010559 0 80106 39908 mmap 11.15 0.010559 0 80106 39908 mmap

6.66 0.006309 0 116907 read 6.66 0.006309 0 116907 读取

2.09 0.001982 0 84445 futex 2.09 0.001982 0 84445 futex

1.49 0.001416 0 45176 write 1.49 0.001416 0 45176 写

0.68 0.000646 0 119975 munmap 0.68 0.000646 0 119975 蒙图

0.58 0.000549 0 227402 clock_gettime 0.58 0.000549 0 227402 时钟获取时间

0.10 0.000095 0 117617 rt_sigprocmask 0.10 0.000095 0 117617 rt_sigprocmask

0.04 0.000040 0 30471 epoll_ctl 0.04 0.000040 0 30471 epoll_ctl

0.03 0.000031 0 71428 gettimeofday 0.03 0.000031 0 71428 gettimeofday

0.00 0.000000 0 36 mprotect 0.00 0.000000 0 36 保护

0.00 0.000000 0 4 brk 0.00 0.000000 0 4


100.00 0.094735 1007268 39908 total 100.00 0.094735 1007268 39908 总计

So it looks like the node process spends most of its time in epoll_wait.所以看起来node进程大部分时间都花在epoll_wait上。

I had a similar issue.我有一个类似的问题。 I didn't need Oplog and I was suggested to add meteor package "disable-oplog".我不需要 Oplog,有人建议我添加流星包“disable-oplog”。 So I did, and the CPU usage was reduced a lot.所以我做了,CPU使用率降低了很多。 If you are not really taking advantage of Oplog it might be better to disable it, so do meteor add disable-oplog and see what happens.如果您没有真正利用 Oplog,那么禁用它可能会更好,所以 Meteer meteor add disable-oplog看看会发生什么。

I hope this helps.我希望这有帮助。

-Are you using Meteor-up ? - 你在使用 Meteor-up 吗? I also use New York 2我也用纽约 2

In my local enviroment with ubuntu server virtual box works awsome with only 512 Mb and 1 Core.在我使用 ubuntu 服务器虚拟机的本地环境中,只有 512 Mb 和 1 个核心才能正常工作。

I'm having the same issue on DigitalOcean 4 Gb RAM, 2 cores VPS + Meteorup (and my app of course).我在 DigitalOcean 4 Gb RAM、2 核 VPS + Meteorup(当然还有我的应用程序)上遇到了同样的问题。

LOCAL ENVIROMENT on virtualbox - 1 CORE - 512 MB - New York 2 - ubuntu 14.04 x86.
-------------------------------------
>Meteor.js = 0.8.0,
>Node = 0.10.26,
>MongoDB shell version = 2.4.10,

>%CPU = 20.8 avg,
>%MEM = 27.4 avg

DIGITALOCEAN 4 GB RAM - 2 CPUS - ubuntu 14.04 x64.
-------------------------------------
>Meteor.js = 0.8.0,
>Node = 0.10.26,
>MongoDB shell version = 2.4.10,

>%CPU = 101.8 avg,
>%MEM = 27.4 avg

> PID meteoru+  20   0 1644244 796692   6228 R **102.2** **32.7**  84:47.08 node 

Also, my app does something like yours.此外,我的应用程序与您的应用程序类似。 Im using CFS package from atmosphere, and node-csv to read the CSV that i upload.我使用来自大气的CFS包和 node-csv 来读取我上传的 CSV。 The upload works great, also node-csv works great....but i can confirm you if thats the problem, it seems to be NODE running on DigitalOcean.上传效果很好,node-csv 也很好用....但我可以确认你是否有问题,它似乎是在 DigitalOcean 上运行的 NODE。 My MongoDB works great also...我的 MongoDB 也很好用...

I was new with VPS and the first thing I tried to do is run my script.我是 VPS 新手,我尝试做的第一件事就是运行我的脚本。 The problem was that I started the same server with node and pm2 a couple of times.问题是我用nodepm2启动了同一台服务器几次。

Solution解决方案

  1. run pm2 kill to kill all processes run by your process manager运行pm2 killpm2 kill进程管理器运行的所有进程
  2. run killall node - to kill all running process if any remains运行killall node - 如果还有剩余,则killall node所有正在运行的进程
  3. run pm2 start <your_server>.js - to run your server again运行pm2 start <your_server>.js - 再次运行你的服务器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM