python线程不能创建超过800个

Question

以下是我的代码，对于python来说我真的很新。 从下面的代码中，我实际上将创建多个线程（大于1000）。 但是在某个时刻，将近800个线程，我收到一条错误消息，提示“错误：无法启动新线程”。 我确实读过一些关于线程池的信息。 我真的不明白。 在我的代码中，如何实现线程池？ 或者至少请以简单的方式向我解释

    #!/usr/bin/python


    import threading
    import urllib

    lock = threading.Lock()

    def get_wip_info(query_str):
          try:
              temp = urllib.urlopen(query_str).read()
          except:
              temp = 'ERROR'
          return temp

    def makeURLcall(arg1, arg2, arg3, file_output, dowhat,        result) :

         url1 = "some URL call with args"
         url2 = "some URL call with args"

        if dowhat == "IN" :
             result = get_wip_info(url1)

        elif dowhat == "OUT" :
             result = get_wip_info(url2)

        lock.acquire()

        report = open(file_output, "a")
        report.writelines("%s - %s\n"%(serial, result))
        report.close()

        lock.release()

        return


    testername = "arg1"
    stationcode = "arg2"
    dowhat = "OUT"
    result = "PASS"
    file_source = "sourcefile.txt"
    file_output = "resultfile.txt"

    readfile = open(file_source, "r")
    Data = readfile.readlines()

    threads = []

    for SNs in Data :
        SNs = SNs.strip()
        print SNs
        thread = threading.Thread(target = makeURLcalls, args = (SNs, args1, testername, file_output, dowhat, result))
        thread.start()

        threads.append(thread)

    for thread in threads :
        thread.join()

Answer 1

不要实现自己的线程池，请使用Python附带的线程池。

在Python 3中，你可以使用concurrent.futures.ThreadPoolExecutor明确地使用线程，关于Python 2.6和更高版本，可以导入Pool从multiprocessing.dummy这类似于multiprocessing API，而是由线程，而不是进程的支持。

当然，如果您需要在CPython（参考解释器）中进行CPU绑定工作，则需要使用适当的multiprocessing ，而不是multiprocessing.dummy ； Python线程适合I / O绑定工作，但是GIL使它们对于CPU绑定工作非常不利。

下面的代码用multiprocessing.dummy的Pool替换您显式使用Thread的代码，使用固定数量的工作线程，每个工作线程尽可能快地完成另一个任务，而不是无限数量的一个工作线程。首先，由于本地I / O可能相当便宜，并且您想同步输出，因此我们将使worker任务返回结果数据，而不是自己将其写出，并让主线程执行写操作到本地磁盘（不再需要锁定，也不需要一遍又一遍地打开文件）。 将此makeURLcall更改为：

# Accept args as a single sequence to ease use of imap_unordered,
# and unpack on first line
def makeURLcall(args):
    arg1, arg2, arg3, dowhat, result = args

    url1 = "some URL call with args"
    url2 = "some URL call with args"

    if dowhat == "IN" :
         result = get_wip_info(url1)
    elif dowhat == "OUT" :
         result = get_wip_info(url2)

    return "%s - %s\n" % (serial, result)

现在，用于替换显式线程使用的代码：

import multiprocessing.dummy as mp
from contextlib import closing

# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
     open(file_output, 'w') as outf,\
     closing(mp.Pool(32)) as pool:
    # Define generator that creates tuples of arguments to pass to makeURLcall
    # We also read the file in lazily instead of using readlines, to
    # start producing results faster
    tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
    # Pulls and writes results from the workers as they become available
    outf.writelines(pool.imap_unordered(makeURLcall, tasks))

# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up

python线程不能创建超过800个

问题描述

1 个解决方案

解决方案1
4 2016-02-03 12:07:42

python线程不能创建超过800个

问题描述

1 个解决方案

解决方案1 4 2016-02-03 12:07:42

解决方案1
4 2016-02-03 12:07:42