[英]OSError: [Errno 24] Too many open files when using reactor.run() in Twisted
I am having a weird issue: I am running a large amount of utils.getProcessOutputAndValue('cmd', [args])
commands and the result depends on whether I started the reactor using task.react()
or reactor.run()
我有一个奇怪的问题:我正在运行大量的utils.getProcessOutputAndValue('cmd', [args])
命令和结果取决于是否使用我开始反应器task.react()
或reactor.run()
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from progress.bar import IncrementalBar
from twisted.internet import defer
from twisted.internet import task
from twisted.internet import utils
from twisted.python import usage
class Options(usage.Options):
optFlags = [['reactor', 'r', 'Use reactor.run().'],
['task', 't', 'Use task.react().'],
['cwr', 'w', 'Use callWhenRunning().']]
optParameters = [['limit', 'l', 255, 'Number of file descriptors to open.'],
['cmd', 'c', 'echo Testing {i}...', 'Command to run.']]
def run(opt):
limit = int(opt['limit'])
cmd, args = opt['cmd'].split(' ', 1)
bar = IncrementalBar('Running {cmd}'.format(cmd=opt['cmd']), max=limit)
requests = []
for i in range(0, limit):
try:
_args = args.format(i=i)
args = _args
except KeyError:
pass
requests.append(utils.getProcessOutputAndValue('echo', [args]))
bar.next()
bar.finish()
return defer.gatherResults(requests)
@defer.inlineCallbacks
def main(reactor, opt):
d = defer.Deferred()
limit = int(opt['limit'])
cmd, args = opt['cmd'].split(' ', 1)
bar = IncrementalBar('Running {cmd}'.format(cmd=opt['cmd']), max=limit)
for i in range(0, limit):
try:
_args = args.format(i=i)
args = _args
except KeyError:
pass
yield utils.getProcessOutputAndValue('echo', [args])
bar.next()
bar.finish()
defer.returnValue(d.callback(True))
if __name__ == '__main__':
opt = Options()
opt.parseOptions()
if opt['reactor']:
from twisted.internet import reactor
task.deferLater(reactor, 0, run, opt)
reactor.run()
elif opt['task']:
from twisted.internet.task import react
react(main, [opt])
elif opt['cwr']:
from twisted.internet import reactor
reactor.callWhenRunning(run, opt)
reactor.run()
When using limit
above 400 (in my case) I get the following error: 当使用超过400的limit
(在我的情况下)时,出现以下错误:
Upon execvpe echo ['echo', 'Testing 0...'] in environment id 42131264
:Traceback (most recent call last):
File "/home/vagrant/.env/sm/lib/python2.7/site-packages/Twisted-15.5.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 428, in _fork
self._setupChild(**kwargs)
File "/home/vagrant/.env/sm/lib/python2.7/site-packages/Twisted-15.5.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 803, in _setupChild
for fd in _listOpenFDs():
File "/home/vagrant/.env/sm/lib/python2.7/site-packages/Twisted-15.5.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 638, in _listOpenFDs
return detector._listOpenFDs()
File "/home/vagrant/.env/sm/lib/python2.7/site-packages/Twisted-15.5.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 553, in _listOpenFDs
self._listOpenFDs = self._getImplementation()
File "/home/vagrant/.env/sm/lib/python2.7/site-packages/Twisted-15.5.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 576, in _getImplementation
after = impl()
File "/home/vagrant/.env/sm/lib/python2.7/site-packages/Twisted-15.5.0-py2.7-linux-x86_64.egg/twisted/internet/process.py", line 606, in _procFDImplementation
return [int(fd) for fd in self.listdir(dname)]
OSError: [Errno 24] Too many open files: '/proc/23421/fd'
Unhandled error in Deferred:
Which does not occur if I am using task.react()
如果我使用task.react()
不会发生
In resume: 在简历中:
python pyerr.py -l100 -r
: OK python pyerr.py -l100 -r
: 确定 python pyerr.py -l100 -t
: OK python pyerr.py -l100 -t
: 确定 python pyerr.py -l100 -w
: OK python pyerr.py -l100 -w
: 确定 python pyerr.py -l400 -r
: OSERR python pyerr.py -l400 -r
: OSERR python pyerr.py -l400 -t
: OK python pyerr.py -l400 -t
: 确定 python pyerr.py -l400 -w
: OSERR python pyerr.py -l400 -w
: OSERR The problem is that I have a big application that uses reactor, because its an application responding to SMTP connections (so cannot use task.react
, I do not want to stop the reactor). 问题是我有一个使用反应器的大型应用程序,因为它的应用程序响应SMTP连接(因此无法使用task.react
,我不想停止反应器)。
I always thought that task.react
was only stopping the reactor once the deferred is done, but I guess is doing more than this... 我一直以为task.react
仅在完成延迟后才停止反应堆,但我想除了此事外...
edit : Here a pstree
comparaison for task.react
vs reactor.run
编辑 :在这里,一个pstree
comparaison为task.react
VS reactor.run
reactor.run (python pyerr.py -l400 -r) : 反应器运行(python pyerr.py -l400 -r) :
init-+-VBoxService---7*[{VBoxService}]
|-acpid
|-atd
|-cron
|-dbus-daemon
|-dhclient
|-6*[getty]
|-master-+-pickup
| `-qmgr
|-mysqld---18*[{mysqld}]
|-nginx---4*[nginx]
|-php5-fpm---2*[php5-fpm]
|-puppet---{puppet}
|-rpc.idmapd
|-rpc.statd
|-rpcbind
|-rsyslogd---3*[{rsyslogd}]
|-ruby---{ruby}
|-sshd-+-3*[sshd---sshd---sftp-server]
| |-sshd---sshd---2*[sftp-server]
| |-sshd---sshd---bash---pstree
| `-sshd---sshd---bash---python-+-323*[echo]
| `-5*[python]
|-systemd-logind
|-systemd-udevd
|-upstart-file-br
|-upstart-socket-
`-upstart-udev-br
task.react (python pyerr.py -l400 -t) : task.react(python pyerr.py -l400 -t) :
init-+-VBoxService---7*[{VBoxService}]
|-acpid
|-atd
|-cron
|-dbus-daemon
|-dhclient
|-6*[getty]
|-master-+-pickup
| `-qmgr
|-mysqld---18*[{mysqld}]
|-nginx---4*[nginx]
|-php5-fpm---2*[php5-fpm]
|-puppet---{puppet}
|-rpc.idmapd
|-rpc.statd
|-rpcbind
|-rsyslogd---3*[{rsyslogd}]
|-ruby---{ruby}
|-sshd-+-3*[sshd---sshd---sftp-server]
| |-sshd---sshd---2*[sftp-server]
| |-sshd---sshd---bash---pstree
| `-sshd---sshd---bash---python---echo
|-systemd-logind
|-systemd-udevd
|-upstart-file-br
|-upstart-socket-
`-upstart-udev-br
Notice the difference between this 注意这之间的区别
| `-sshd---sshd---bash---python-+-323*[echo]
| `-5*[python]
and this 和这个
| `-sshd---sshd---bash---python---echo
in one cas it seems that processes are not closed as soon as completed. 在一个cas中,似乎没有在完成后立即关闭进程。
I have tested this issue on 4 different machines: 我已经在4种不同的机器上测试了这个问题:
The issue is exactly the same. 问题是完全一样的。
To give a shot, try run watch -n 0.1 "pstree"
to see how the processes are evolving. 要尝试一下,请尝试运行watch -n 0.1 "pstree"
来查看watch -n 0.1 "pstree"
如何发展。
edit: I get it why this is happening thanks to Glyph answer, but how to adapt this to my real life case ? 编辑:我知道为什么这要归功于Glyph的回答,但是如何使其适应我的现实生活呢?
The application I am developing with Twisted is an SMTP filter based on Milter, here how it works (assume we want to check the email signature): 我正在使用Twisted开发的应用程序是一个基于Milter的SMTP过滤器,下面是它的工作原理(假设我们要检查电子邮件签名):
/usr/bin/openssl mime
call milter调用远程“模块”服务器,该服务器将使用/usr/bin/openssl mime
调用来处理签名检查 In this case, my problem is that is I get 150 simultaneous connections, there will be 150 calls to the module (TCP protocol) and this module will invoke the openssl command once per connection. 在这种情况下,我的问题是我获得了150个同时连接,将有150个对模块的调用(TCP协议),并且该模块将为每个连接调用openssl命令一次。
The module is completely agnostic, therefore will not know if other calls are running. 该模块是完全不可知的,因此将不知道是否正在运行其他调用。 Where should I put the DeferredSemaphore
in your opinion ? 我应该把DeferredSemaphore
放在哪里?
My problem here is that smtp connections are also agnostics and don't know about other possible opens connections. 我的问题是smtp连接也是不可知论的,并且不知道其他可能的打开连接。
What is the correct way of handling this parallellism in your opinion? 您认为处理这种并行性的正确方法是什么?
The problem here has nothing to do with the distinction between task.react
and reactor.run
, but rather, the subtle but significant difference between the implementation of your run
and main
functions. 这里的问题有没有关系之间的区别task.react
和reactor.run
,而是你的实现之间的微妙而显著差异run
和main
功能。
The difference is that run
is spawning limit
processes in parallel , racking up thousands of simultaneous open file descriptors, easily blowing through your system's limitations. 区别在于run
是并行生成 limit
进程,同时堆积成千上万个同时打开的文件描述符,从而很容易突破系统的限制。 However, main
is waiting for every process to completely finish executing before even starting up the next one, which means it never uses more than 4 or 5 at a time. 但是, main
正在等待每个进程完全完成执行,甚至没有启动下一个进程,这意味着它永远不会一次使用超过4或5。
The reason is that main
is decorated by inlineCallbacks
and yields every getProcessOutputAndValue
Deferred
, which suspends execution of main
until that Deferred
has completed. 原因是main
由inlineCallbacks
装饰并产生每个getProcessOutputAndValue
Deferred
,这将暂停main
执行,直到Deferred
完成。
In real applications, neither of these approaches is ideal. 在实际应用中,这些方法都不是理想的。 You want some parallelism, but not unlimited. 您需要一些并行性,但不是无限的。 Twisted comes with some utilities, such as DeferredSemaphore
, to facilitate limited parallelism without restricting everything to only run one task at a time. Twisted附带了一些实用程序,例如DeferredSemaphore
,以促进有限的并行性,而不限制所有内容一次仅运行一个任务。 Jean-Paul Calderone wrote an article - 10 years ago! Jean-Paul Calderone写了一篇文章-10年前! - that explains how to use this, here . - 在此处说明如何使用此功能。
However, just to demonstrate that the issue has nothing to do with task.react
, here's a modified version of your example which eliminates the run
function and makes an apples-to-apples comparison using main
: 但是,仅为了说明问题与task.react
,这是示例的修改版本,该版本消除了run
函数,并使用main
进行了一个苹果对苹果的比较:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from progress.bar import IncrementalBar
from twisted.internet import defer
from twisted.internet import task
from twisted.internet import utils
from twisted.python import usage
class Options(usage.Options):
optFlags = [['reactor', 'r', 'Use reactor.run().'],
['task', 't', 'Use task.react().'],
['cwr', 'w', 'Use callWhenRunning().']]
optParameters = [['limit', 'l', 255, 'Number of file descriptors to open.'],
['cmd', 'c', 'echo Testing {i}...', 'Command to run.']]
@defer.inlineCallbacks
def main(reactor, opt):
d = defer.Deferred()
limit = int(opt['limit'])
cmd, args = opt['cmd'].split(' ', 1)
bar = IncrementalBar('Running {cmd}'.format(cmd=opt['cmd']), max=limit)
for i in range(0, limit):
try:
_args = args.format(i=i)
args = _args
except KeyError:
pass
yield utils.getProcessOutputAndValue('echo', [args])
bar.next()
bar.finish()
defer.returnValue(d.callback(True))
if __name__ == '__main__':
opt = Options()
opt.parseOptions()
if opt['reactor']:
from twisted.internet import reactor
task.deferLater(reactor, 0, main, reactor, opt)
reactor.run()
elif opt['task']:
from twisted.internet.task import react
react(main, [opt])
elif opt['cwr']:
from twisted.internet import reactor
reactor.callWhenRunning(main, reactor, opt)
reactor.run()
edit, responding to edit in the question: 编辑,回答问题中的编辑:
Since your real problem is with incoming connections, and not just a for
loop, rather than using DeferredSemaphore
, you might instead need to maintain a counter, and take advantage of the fact that the object returned from listenTCP
, or the result of the Deferred
that comes back from TCP4ServerEndpoint
, implements IPushProducer
, and call pauseProducing()
on it when too many concurrent connections are doing work, and resumeProducing()
when that work is done. 由于真正的问题是传入连接,而不只是for
循环,而不是使用DeferredSemaphore
,因此您可能需要维护一个计数器,并利用对象从listenTCP
返回的事实或Deferred
的结果。从TCP4ServerEndpoint
,实现IPushProducer
,并在有太多并发连接进行工作时在其上调用pauseProducing()
在完成并发工作时resumeProducing()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.