简体   繁体   English

如何在 Python 中使用线程?

[英]How can I use threading in Python?

I am trying to understand threading in Python. I've looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I'm having trouble understanding them.我试图了解 Python 中的线程。我查看了文档和示例,但坦率地说,许多示例过于复杂,我很难理解它们。

How do you clearly show tasks being divided for multi-threading?您如何清楚地显示为多线程划分的任务?

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool .自从 2010 年提出这个问题以来,如何使用 Python 使用mappool进行简单的多线程处理已经得到了真正的简化。

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks .下面的代码来自一篇文章/博客文章,您绝对应该查看(无从属关系)- Parallelism in a line: A Better Model for Day to Day Threading Tasks I'll summarize below - it ends up being just a few lines of code:我将在下面进行总结 - 它最终只是几行代码:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

Which is the multithreaded version of:哪个是多线程版本:

results = []
for item in my_array:
    results.append(my_function(item))

Description描述

Map is a cool little function, and the key to easily injecting parallelism into your Python code. Map 是一个很酷的小函数,是轻松将并行性注入 Python 代码的关键。 For those unfamiliar, map is something lifted from functional languages like Lisp.对于那些不熟悉的人来说,map 是从像 Lisp 这样的函数式语言中提炼出来的。 It is a function which maps another function over a sequence.它是一个将另一个函数映射到序列上的函数。

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end. Map 为我们处理序列上的迭代,应用函数,并在最后将所有结果存储在一个方便的列表中。

在此处输入图像描述


Implementation执行

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy. map 函数的并行版本由两个库提供:multiprocessing,以及它鲜为人知但同样出色的 step child:multiprocessing.dummy。

multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead ( an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O ): multiprocessing.dummy与 multiprocessing 模块完全相同, 但使用线程代替一个重要的区别- 对 CPU 密集型任务使用多个进程;线程用于(和期间)I/O ):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module. multiprocessing.dummy 复制了多处理的 API,但只是线程模块的包装器。

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

And the timing results:以及计时结果:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

Passing multiple arguments (works like this only in Python 3.3 and later ):传递多个参数仅在 Python 3.3 及更高版本中这样工作):

To pass multiple arrays:传递多个数组:

results = pool.starmap(function, zip(list_a, list_b))

Or to pass a constant and an array:或者传递一个常量和一个数组:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

If you are using an earlier version of Python, you can pass multiple arguments via this workaround ).如果您使用的是早期版本的 Python,则可以通过此解决方法传递多个参数)。

(Thanks to user136036 for the helpful comment.) (感谢user136036的有用评论。)

Here's a simple example: you need to try a few alternative URLs and return the contents of the first one to respond.这是一个简单的示例:您需要尝试几个替代 URL 并返回第一个 URL 的内容以进行响应。

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, to put its contents on the queue;这是一个将线程用作简单优化的情况:每个子线程都在等待 URL 解析和响应,将其内容放入队列; each thread is a daemon (won't keep the process up if the main thread ends -- that's more common than not);每个线程都是一个守护进程(如果主线程结束,则不会保持进程正常运行——这比没有更常见); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put , then emits the results and terminates (which takes down any subthreads that might still be running, since they're daemon threads).主线程启动所有子线程,在队列上执行一个get等待直到其中一个完成put ,然后发出结果并终止(这会取消任何可能仍在运行的子线程,因为它们是守护线程)。

Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn't use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there's a wait for some I/O).在 Python 中正确使用线程总是与 I/O 操作相关(因为 CPython 无论如何都不使用多核来运行 CPU 密集型任务,线程化的唯一原因是在等待某些 I/O 时不阻塞进程)。 Queues are almost invariably the best way to farm out work to threads and/or collect the work's results, by the way, and they're intrinsically threadsafe, so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.顺便说一句,队列几乎总是将工作分配给线程和/或收集工作结果的最佳方式,而且它们本质上是线程安全的,因此它们使您不必担心锁、条件、事件、信号量和其他交互-线程协调/通信概念。

NOTE : For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations).注意:对于 Python 中的实际并行化,您应该使用multiprocessing模块来分叉多个并行执行的进程(由于全局解释器锁,Python 线程提供交错,但它们实际上是串行执行的,而不是并行执行的,并且仅在交错 I/O 操作时很有用)。

However, if you are merely looking for interleaving (or are doing I/O operations that can be parallelized despite the global interpreter lock), then the threading module is the place to start.但是,如果您只是在寻找交错(或者正在执行尽管全局解释器锁定但可以并行化的 I/O 操作),那么线程模块就是开始的地方。 As a really simple example, let's consider the problem of summing a large range by summing subranges in parallel:作为一个非常简单的例子,让我们考虑通过并行求和子范围来对大范围求和的问题:

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

Note that the above is a very stupid example, as it does absolutely no I/O and will be executed serially albeit interleaved (with the added overhead of context switching) in CPython due to the global interpreter lock.请注意,上面是一个非常愚蠢的示例,因为它绝对没有 I/O,并且由于全局解释器锁而在CPython中交替执行(增加了上下文切换的开销),但它将串行执行。

Like others mentioned, CPython can use threads only for I/O waits due to GIL .像其他人提到的那样,由于GIL ,CPython 只能将线程用于 I/O 等待。

If you want to benefit from multiple cores for CPU-bound tasks, use multiprocessing :如果您想从 CPU 密集型任务的多核中受益,请使用multiprocessing

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

Just a note: A queue is not required for threading.请注意:线程不需要队列。

This is the simplest example I could imagine that shows 10 processes running concurrently.这是我能想象到的最简单的例子,它显示了 10 个同时运行的进程。

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

The answer from Alex Martelli helped me. Alex Martelli 的回答帮助了我。 However, here is a modified version that I thought was more useful (at least to me).然而,这是一个我认为更有用的修改版本(至少对我来说)。

Updated: works in both Python 2 and Python 3更新:适用于 Python 2 和 Python 3

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()

I found this very useful: create as many threads as cores and let them execute a (large) number of tasks (in this case, calling a shell program):我发现这非常有用:创建与内核一样多的线程并让它们执行(大量)任务(在这种情况下,调用 shell 程序):

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
    q.put(i)

def worker():
    while True:
        item = q.get()
        # Execute a task: call a shell program and wait until it completes
        subprocess.call("echo " + str(item), shell=True)
        q.task_done()

cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
     t = threading.Thread(target=worker)
     t.daemon = True
     t.start()

q.join() # Block until all tasks are done

Given a function, f , thread it like this:给定一个函数f ,像这样线程化它:

import threading
threading.Thread(target=f).start()

To pass arguments to f将参数传递给f

threading.Thread(target=f, args=(a,b,c)).start()

Python 3 has the facility of launching parallel tasks . Python 3 具有启动并行任务的功能。 This makes our work easier.这使我们的工作更轻松。

It has thread pooling and process pooling .它有线程池进程池

The following gives an insight:下面给出一个见解:

ThreadPoolExecutor Example ( source ) ThreadPoolExecutor 示例源代码

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

ProcessPoolExecutor ( source ) ProcessPoolExecutor

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

I saw a lot of examples here where no real work was being performed, and they were mostly CPU-bound.我在这里看到了很多没有执行实际工作的示例,它们大多受 CPU 限制。 Here is an example of a CPU-bound task that computes all prime numbers between 10 million and 10.05 million.这是一个 CPU 密集型任务的示例,它计算 1000 万到 1005 万之间的所有素数。 I have used all four methods here:我在这里使用了所有四种方法:

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def time_stuff(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{} seconds".format(t1 - t0))
    return wrapper

def find_primes_in(nmin, nmax):
    """
    Compute a list of prime numbers between the given minimum and maximum arguments
    """
    primes = []

    # Loop from minimum to maximum
    for current in range(nmin, nmax + 1):

        # Take the square root of the current number
        sqrt_n = int(math.sqrt(current))
        found = False

        # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
        for number in range(2, sqrt_n + 1):

            # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
            if current % number == 0:
                found = True
                break

        # If not divisible, add this number to the list of primes that we have found so far
        if not found:
            primes.append(current)

    # I am merely printing the length of the array containing all the primes, but feel free to do what you want
    print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
    """
    Use the main process and main thread to compute everything in this case
    """
    find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
    """
    If the minimum is 1000 and the maximum is 2000 and we have four workers,
    1000 - 1250 to worker 1
    1250 - 1500 to worker 2
    1500 - 1750 to worker 3
    1750 - 2000 to worker 4
    so let’s split the minimum and maximum values according to the number of workers
    """
    nrange = nmax - nmin
    threads = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)

        # Start the thread with the minimum and maximum split up to compute
        # Parallel computation will not work here due to the GIL since this is a CPU-bound task
        t = threading.Thread(target = find_primes_in, args = (start, end))
        threads.append(t)
        t.start()

    # Don’t forget to wait for the threads to finish
    for t in threads:
        t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
    """
    Split the minimum, maximum interval similar to the threading method above, but use processes this time
    """
    nrange = nmax - nmin
    processes = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)
        p = multiprocessing.Process(target = find_primes_in, args = (start, end))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use a thread pool executor this time.
    This method is slightly faster than using pure threading as the pools manage threads more efficiently.
    This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
    """
    nrange = nmax - nmin
    with ThreadPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use the process pool executor.
    This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
    RECOMMENDED METHOD FOR CPU-BOUND TASKS
    """
    nrange = nmax - nmin
    with ProcessPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

def main():
    nmin = int(1e7)
    nmax = int(1.05e7)
    print("Sequential Prime Finder Starting")
    sequential_prime_finder(nmin, nmax)
    print("Threading Prime Finder Starting")
    threading_prime_finder(nmin, nmax)
    print("Processing Prime Finder Starting")
    processing_prime_finder(nmin, nmax)
    print("Thread Executor Prime Finder Starting")
    thread_executor_prime_finder(nmin, nmax)
    print("Process Executor Finder Starting")
    process_executor_prime_finder(nmin, nmax)
if __name__ == "__main__":
    main()

Here are the results on my Mac OS X four-core machine这是我的 Mac OS X 四核机器上的结果

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds

Using the blazing new concurrent.futures module使用全新的concurrent.futures模块

def sqr(val):
    import time
    time.sleep(0.1)
    return val * val

def process_result(result):
    print(result)

def process_these_asap(tasks):
    import concurrent.futures

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for task in tasks:
            futures.append(executor.submit(sqr, task))

        for future in concurrent.futures.as_completed(futures):
            process_result(future.result())
        # Or instead of all this just do:
        # results = executor.map(sqr, tasks)
        # list(map(process_result, results))

def main():
    tasks = list(range(10))
    print('Processing {} tasks'.format(len(tasks)))
    process_these_asap(tasks)
    print('Done')
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())

The executor approach might seem familiar to all those who have gotten their hands dirty with Java before. executor 方法对于所有以前接触过 Java 的人来说似乎很熟悉。

Also on a side note: To keep the universe sane, don't forget to close your pools/executors if you don't use with context (which is so awesome that it does it for you)另外附注:为了保持宇宙的健全,如果您不with上下文,请不要忘记关闭您的池/执行程序(这太棒了,它为您做到了)

For me, the perfect example for threading is monitoring asynchronous events.对我来说,线程的完美示例是监视异步事件。 Look at this code.看看这段代码。

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

You can play with this code by opening an IPython session and doing something like:您可以通过打开IPython会话并执行以下操作来使用此代码:

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

Wait a few minutes等几分钟

>>> a[0] = 2
Mon = 2

Most documentation and tutorials use Python's Threading and Queue module, and they could seem overwhelming for beginners.大多数文档和教程都使用 Python 的ThreadingQueue模块,对于初学者来说,它们似乎难以承受。

Perhaps consider the concurrent.futures.ThreadPoolExecutor module of Python 3.也许考虑 Python 3 的concurrent.futures.ThreadPoolExecutor模块。

Combined with with clause and list comprehension it could be a real charm.结合with子句和列表理解,它可能是一个真正的魅力。

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
    # Your actual program here. Using threading.Lock() if necessary
    return ""

# List of URLs to fetch
urls = ["url1", "url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

    # Create threads
    futures = {executor.submit(get_url, url) for url in urls}

    # as_completed() gives you the threads once finished
    for f in as_completed(futures):
        # Get the results
        rs = f.result()

With borrowing from this post we know about choosing between the multithreading, multiprocessing, and async/ asyncio and their usage.借用这篇文章,我们知道如何在多线程、多处理和 async/ asyncio之间进行选择以及它们的用法。

Python 3 has a new built-in library in order to make concurrency and parallelism: concurrent.futures Python 3有一个新的内置库来实现并发性和并行性: concurrent.futures

So I'll demonstrate through an experiment to run four tasks (ie .sleep() method) by Threading-Pool :所以我将通过一个实验来演示通过Threading-Pool运行四个任务(即.sleep()方法):

from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time

def concurrent(max_worker):
    futures = []
    tic = time()
    with ThreadPoolExecutor(max_workers=max_worker) as executor:
        futures.append(executor.submit(sleep, 2))  # Two seconds sleep
        futures.append(executor.submit(sleep, 1))
        futures.append(executor.submit(sleep, 7))
        futures.append(executor.submit(sleep, 3))
        for future in as_completed(futures):
            if future.result() is not None:
                print(future.result())
    print(f'Total elapsed time by {max_worker} workers:', time()-tic)

concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)

Output:输出:

Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507

[ NOTE ]: [注意]:

  • As you can see in the above results, the best case was 3 workers for those four tasks.正如您在上面的结果中看到的,最好的情况是3名工人完成这四项任务。
  • If you have a process task instead of I/O bound or blocking ( multiprocessing instead of threading ) you can change the ThreadPoolExecutor to ProcessPoolExecutor .如果您有一个流程任务而不是 I/O 绑定或阻塞( multiprocessing而不是threading ),您可以将ThreadPoolExecutor更改为ProcessPoolExecutor

Here is the very simple example of CSV import using threading.这是使用线程导入CSV的非常简单的示例。 (Library inclusion may differ for different purpose.) (图书馆包含可能因不同目的而有所不同。)

Helper Functions:辅助功能:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

Driver Function:驱动功能:

import_handler(csv_file_name)

I would like to contribute with a simple example and the explanations I've found useful when I had to tackle this problem myself.我想贡献一个简单的例子,以及当我不得不自己解决这个问题时我发现有用的解释。

In this answer you will find some information about Python's GIL (global interpreter lock) and a simple day-to-day example written using multiprocessing.dummy plus some simple benchmarks.在此答案中,您将找到有关 Python 的GIL (全局解释器锁)的一些信息以及使用 multiprocessing.dummy 编写的简单日常示例以及一些简单的基准测试。

Global Interpreter Lock (GIL)全局解释器锁 (GIL)

Python doesn't allow multi-threading in the truest sense of the word. Python 不允许真正意义上的多线程。 It has a multi-threading package, but if you want to multi-thread to speed your code up, then it's usually not a good idea to use it.它有一个多线程包,但如果你想多线程来加速你的代码,那么使用它通常不是一个好主意。

Python has a construct called the global interpreter lock (GIL). Python 有一个称为全局解释器锁 (GIL) 的结构。 The GIL makes sure that only one of your 'threads' can execute at any one time. GIL 确保在任何时候只有一个“线程”可以执行。 A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.一个线程获取 GIL,做一些工作,然后将 GIL 传递给下一个线程。

This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core.这发生得非常快,因此在人眼看来,您的线程似乎是在并行执行,但实际上它们只是轮流使用相同的 CPU 内核。

All this GIL passing adds overhead to execution.所有这些 GIL 传递都会增加执行的开销。 This means that if you want to make your code run faster then using the threading package often isn't a good idea.这意味着如果你想让你的代码运行得更快,那么使用 threading 包通常不是一个好主意。

There are reasons to use Python's threading package.使用 Python 的线程包是有原因的。 If you want to run some things simultaneously, and efficiency is not a concern, then it's totally fine and convenient.如果你想同时运行一些东西,效率不是问题,那完全没问题,很方便。 Or if you are running code that needs to wait for something (like some I/O) then it could make a lot of sense.或者,如果您正在运行需要等待某事(例如某些 I/O)的代码,那么它可能会很有意义。 But the threading library won't let you use extra CPU cores.但是线程库不会让你使用额外的 CPU 内核。

Multi-threading can be outsourced to the operating system (by doing multi-processing), and some external application that calls your Python code (for example, Spark or Hadoop ), or some code that your Python code calls (for example: you could have your Python code call a C function that does the expensive multi-threaded stuff).多线程可以外包给操作系统(通过执行多处理),以及调用您的 Python 代码的一些外部应用程序(例如SparkHadoop ),或者您的 Python 代码调用的一些代码(例如:您可以让你的 Python 代码调用一个 C 函数来执行昂贵的多线程操作)。

Why This Matters为什么这很重要

Because lots of people spend a lot of time trying to find bottlenecks in their fancy Python multi-threaded code before they learn what the GIL is.因为很多人在了解 GIL 是什么之前,会花费大量时间试图在他们花哨的 Python 多线程代码中找到瓶颈。

Once this information is clear, here's my code:清楚这些信息后,这是我的代码:

#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os

# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8

def do_ping(ip):
    if os.name == 'nt':
        print ("Using Windows Ping to " + ip)
        proc = Popen(['ping', ip], stdout=PIPE)
        return proc.communicate()[0]
    else:
        print ("Using Linux / Unix Ping to " + ip)
        proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
        return proc.communicate()[0]


os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
    result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
    do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
    output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")

print ("\nPretty printed output: ")
for key, value in output.items():
    print (key + "\n")
    print (value)

Here is multi threading with a simple example which will be helpful.这是带有一个简单示例的多线程,这将很有帮助。 You can run it and understand easily how multi threading is working in Python.您可以运行它并轻松理解多线程在 Python 中是如何工作的。 I used a lock for preventing access to other threads until the previous threads finished their work.我使用锁来防止访问其他线程,直到前一个线程完成工作。 By the use of this line of code,通过使用这行代码,

tLock = threading.BoundedSemaphore(value=4) tLock = threading.BoundedSemaphore(value=4)

you can allow a number of processes at a time and keep hold to the rest of the threads which will run later or after finished previous processes.您可以一次允许多个进程并保留将在稍后或完成先前进程之后运行的其余线程。

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
    print  "\r\nTimer: ", name, " Started"
    tLock.acquire()
    print "\r\n", name, " has the acquired the lock"
    while repeat > 0:
        time.sleep(delay)
        print "\r\n", name, ": ", str(time.ctime(time.time()))
        repeat -= 1

    print "\r\n", name, " is releaseing the lock"
    tLock.release()
    print "\r\nTimer: ", name, " Completed"

def Main():
    t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
    t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
    t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
    t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
    t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t5.start()

    print "\r\nMain Complete"

if __name__ == "__main__":
    Main()

None of the previous solutions actually used multiple cores on my GNU/Linux server (where I don't have administrator rights).以前的解决方案都没有真正在我的 GNU/Linux 服务器上使用多个内核(我没有管理员权限)。 They just ran on a single core.他们只是在一个核心上运行。

I used the lower level os.fork interface to spawn multiple processes.我使用较低级别的os.fork接口来生成多个进程。 This is the code that worked for me:这是对我有用的代码:

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
    p = fork()
    if p == 0:
        my_function(values[i])
        break
import threading   
myHeavyFctThread = threading.Thread(name='myHeavyFunction', target=myHeavyFunction)
f = threading.Thread(name='foreground', target=foreground)

when instead of myHeavyFunction you pass your fct's name and when you need to activate the thread: 当您传递函数名称而不是myHeavyFunction时,以及何时需要激活线程时:

myHeavyFctThread.start()

I know its late but might help someone :D 我知道晚了,但可能会帮助某人:D

As a python3 version of the second anwser:作为第二个anwser的python3版本:

import queue as Queue
import threading
import urllib.request

# Called by each thread
def get_url(q, url):
    q.put(urllib.request.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com", "http://www.python.org","https://wiki.python.org/moin/"]

q = Queue.Queue()
def thread_func():
    for u in theurls:
        t = threading.Thread(target=get_url, args = (q,u))
        t.daemon = True
        t.start()

    s = q.get()
    
def non_thread_func():
    for u in theurls:
        get_url(q,u)
        

    s = q.get()
   

And you can test it:你可以测试它:

start = time.time()
thread_func()
end = time.time()
print(end - start)

start = time.time()
non_thread_func()
end = time.time()
print(end - start)

non_thread_func() should cost 4 times the time spent than thread_func() non_thread_func() 花费的时间应该是 thread_func() 的 4 倍

import threading
import requests

def send():

  r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()

It's very easy to understand.这很容易理解。 Here are the two simple ways to do threading.这是进行线程的两种简单方法。

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading

def a(a=1, b=2):
    print(a)
    time.sleep(5)
    print(b)
    return a+b

def b(**kwargs):
    if "a" in kwargs:
        print("am b")
    else:
        print("nothing")
        
to_do=[]
executor = ThreadPoolExecutor(max_workers=4)
ex1=executor.submit(a)
to_do.append(ex1)
ex2=executor.submit(b, **{"a":1})
to_do.append(ex2)

for future in as_completed(to_do):
    print("Future {} and Future Return is {}\n".format(future, future.result()))

print("threading")

to_do=[]
to_do.append(threading.Thread(target=a))
to_do.append(threading.Thread(target=b, kwargs={"a":1}))

for threads in to_do:
    threads.start()
    
for threads in to_do:
    threads.join()

This code below can run 10 threads concurrently printing the numbers from 0 to 99 :下面的代码可以运行10 个线程并发打印从099的数字:

from threading import Thread

def test():
    for i in range(0, 100):
        print(i)

thread_list = []

for _ in range(0, 10):
    thread = Thread(target=test)
    thread_list.append(thread)

for thread in thread_list:
    thread.start()

for thread in thread_list:
    thread.join()

And, this code below is the shorthand for loop version of the above code running 10 threads concurrently printing the numbers from 0 to 99 :并且,下面的代码是上述代码运行10 个线程并同时打印从099的数字for循环版本的简写

from threading import Thread

def test():
    [print(i) for i in range(0, 100)]

thread_list = [Thread(target=test) for _ in range(0, 10)]

[thread.start() for thread in thread_list]

[thread.join() for thread in thread_list]

This is the result below:这是下面的结果:

...
99
83
97
84
98
99
85
86
87
88
...

The easiest way of using threading/multiprocessing is to use more high level libraries like autothread .使用线程/多处理的最简单方法是使用更高级的库,如autothread

import autothread
from time import sleep as heavyworkload

@autothread.multithreaded() # <-- This is all you need to add
def example(x: int, y: int):
    heavyworkload(1)
    return x*y

Now, you can feed your functions lists of ints.现在,您可以为您的函数提供整数列表。 Autothread will handle everything for you and just give you the results computed in parallel. Autothread 将为您处理所有事情,并为您提供并行计算的结果。

result = example([1, 2, 3, 4, 5], 10)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM