简体   繁体   English

python-for循环中的多线程

[英]python - multi-threading in a for loop

This code runs ok for a little bit, then it gives me this error: 这段代码可以正常运行一点,然后给了我这个错误:

thread.error: can't start new thread

What am I doing wrong? 我究竟做错了什么? The names file is about 10,000 names long, the email file is about 5 emails long. 名称文件的长度约为10,000个名称,电子邮件文件的长度约为5个电子邮件。

for x in open(names):
    name = x.strip()


    def check(q):
        while True:
            email = q.get()
            lock.acquire()
            print email, name, threading.active_count()
            lock.release()

            #Do things in 
            #the internet

            q.task_done()
        return

    for i in range(threads):
            t = threading.Thread(target=check, args=(q,))
            t.setDaemon(True)
            t.start()

    for word in open(emails):
            q.put(word.strip())

    q.join()

I only specify 2 threads, but it ends up creating hundreds then crashes when the active_count is around 890. How can I fix this? 我仅指定2个线程,但最终会创建数百个线程,而当active_count大约为890时崩溃。如何解决此问题?

Here is a slightly modified version using a semaphore object 这是使用信号量对象的稍作修改的版本

import threading
import Queue

NUM_THREADS = 2 # you can change this if you want

semaphore = threading.Semaphore(NUM_THREADS)

threads = NUM_THREADS

running_threads = []

lock = threading.Lock()

q = Queue.Queue()

# moved the check function out of the loop
def check(name, q, s):
    # acquire the semaphore
    with s:
        not_empty = True

        while not_empty:

            try:
                email = q.get(False) # we are passing false so it won't block.
            except Queue.Empty, e:
                not_empty = False
                break

            lock.acquire()

            print email, name, threading.active_count()

            lock.release()

            # additional work ...

            q.task_done()

for x in open(names):
    name = x.strip()

    for word in open(emails):
        q.put(word.strip())

    for i in range(threads):
            t = threading.Thread(target=check, args=(name, q, semaphore))
            # t.setDaemon(True) # we are not setting the damenon flag
            t.start()

            running_threads.append(t)

    # joining threads (we need this if the daemon flag is false)
    for t in running_threads:
        t.join()

    # joining queue (Probably won't need this if the daemon flag is false)
    q.join()

You could simplify your code using a thread pool: 您可以使用线程池简化代码:

from contextlib import closing
from itertools import product
from multiprocessing.dummy import Pool # thread pool

def foo(arg):
    name, email = map(str.strip, arg)
    try:
        # "do things in the internet"
    except Exception as e:
        return (name, email), None, str(e)
    else:
        return (name, email), result, None

with open(names_filename) as names_file, \
     open(emails_filename) as emails_file, \
     closing(Pool(max_threads_count)) as pool:
    args = product(names_file, emails_file)
    it = pool.imap_unordered(foo, args, chunksize=100)
    for (name, email), result, error in it:
        if error is not None:
            print("Failed to foo {} {}: {}".format(name, email, error))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM