gevent：产生大量greenlets的缺点？

Question

Following on from my question in the comment to this answer to the question "Gevent pool with nested web requests" : 继续我在评论中回答问题“Gevent pool with nested web requests”的问题：

Assuming one has a large number of tasks, is there any downside to using gevent.spawn(...) to spawn all of them simultaneously rather than using a gevent pool and pool.spawn(...) to limit the number of concurrent greenlets? 假设一个人有大量任务，使用gevent.spawn（...）同时生成所有任务是否有任何缺点，而不是使用gevent池和pool.spawn（...）来限制并发数量greenlets？

Formulated differently: is there any advantage to "limiting concurrency" with a gevent.Pool even if not required by the problem to be solved? 表达方式不同：即使不是要解决的问题不需要，使用gevent.Pool“限制并发”是否有任何优势？

Any idea what would constitute a "large number" for this issue? 知道什么会构成这个问题的“大数”吗？

Answer 1

It's just cleaner and a good practice when dealing with a lot of stuff. 处理很多东西时，它只是更干净，也是一种很好的做法。 I ran into this a few weeks ago I was using gevent spawn to verify a bunch of emails against DNS on the order of 30k :). 几个星期前我遇到了这个问题，我正在使用gevent spawn来验证一堆针对DNS的电子邮件，大约30k :)。

from gevent.pool import Pool
import logging
rows = [ ... a large list of stuff ...]
CONCURRENCY = 200 # run 200 greenlets at once or whatever you want
pool = Pool(CONCURRENCY)
count = 0

def do_work_function(param1,param2):
   print param1 + param2

for row in rows:
  count += 1 # for logging purposes to track progress
  logging.info(count)
  pool.spawn(do_work_function,param1,param2) # blocks here when pool size == CONCURRENCY

pool.join() #blocks here until the last 200 are complete

I found in my testing that when CONCURRENCY was around 200 is when my machine load would hover around 1 on a EC2 m1.small. 我在测试中发现，当CONCURRENCY大约为200时，我的机器负载将在EC2 m1.small上徘徊在1左右。 I did it a little naively though, if I were to do it again I'd run multiple pools and sleep some time in between them to try to distribute the load on the NIC and CPU more evenly. 我有点天真地做了，如果我再次这样做，我会运行多个池并在它们之间休息一段时间，以尝试更均匀地分配NIC和CPU上的负载。

One last thing to keep in mind is keeping an eye on your open files and increasing that if need be: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files . 要记住的最后一件事是密切关注您的打开文件，并在需要时增加它： http ： //www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files 。 The greenlets I was running were taking up around 5 file descriptors per greenlet so you can run out pretty quickly if you aren't careful. 我运行的greenlets每个greenlet占用大约5个文件描述符，所以如果你不小心，你可以很快耗尽。 This may not be helpful if your system load is above one as you'll start seeing diminishing returns regardless. 如果您的系统负载高于1，则可能没有用，因为无论如何您都会看到收益递减。

Answer 2

Came here from Google and decided to run a few quick tests to spawn increasing N greenlets. 来自谷歌并决定进行一些快速测试，以产生越来越多的N greenlets。 Sharing the results as they might be useful to fellow searchers: 分享结果，因为它们可能对其他搜索者有用：

# 1 greenlet
real    0m1.032s
user    0m0.017s
sys     0m0.009s

# 100 greenlets
real    0m1.037s
user    0m0.021s
sys     0m0.010s

# 1,000 greenlets
real    0m1.045s
user    0m0.035s
sys     0m0.013s

# 10,000 greenlets
real    0m1.232s
user    0m0.265s
sys     0m0.059s

# 100,000 greenlets
real    0m3.992s
user    0m3.201s
sys     0m0.444s

So up to 1,000 greenlets and the performance loss is tiny, but once you start hitting 10,000+ greenlets, everything slows down. 因此，多达1,000个greenlets并且性能损失很小，但是一旦你开始击中10,000多个greenlets，一切都会变慢。

Test code: 测试代码：

import gevent

N = 0

def test():
    gevent.sleep(1)

while N < 1000:
  N += 1
  gevent.spawn(test)

gevent.wait()

gevent：产生大量greenlets的缺点？

问题描述

2 个解决方案

解决方案1
20 已采纳 2013-11-07 02:11:55

解决方案2
9 2014-12-20 23:22:31

gevent：产生大量greenlets的缺点？

问题描述

2 个解决方案

解决方案1 20 已采纳 2013-11-07 02:11:55

解决方案2 9 2014-12-20 23:22:31

解决方案1
20 已采纳 2013-11-07 02:11:55

解决方案2
9 2014-12-20 23:22:31