简体   繁体   English

何时使用线程以及使用多少线程

[英]When to use threading and how many threads to use

I have a project for work. 我有一个工作项目。 We had written a module and there as a #TODO to implement threading to improve the module. 我们已经编写了一个模块,并以#TODO的形式来实现用于改进模块的线程。 I'm a fairly new python programmer and decided to take a whack at it. 我是一个相当新的python程序员,因此决定要大吃一惊。 While learning and implementing the threading, I had the question similar to How many threads is too many? 在学习和实现线程时,我遇到的问题类似于“多少线程太多?”。 because we have a queue of about maybe 6 objects that need to be processed, so why make 6 threads (or any threads at all) to process objects in a list or queue when the processing time is negligible anyway? 因为我们有大约6个对象需要处理的队列,所以当处理时间可以忽略不计时,为什么要使6个线程(或根本没有任何线程)处理列表或队列中的对象? (Each object takes at most about 2 seconds to process) (每个对象最多需要大约2秒钟来处理)

So I ran a little experiment. 所以我做了一个小实验。 I wanted to know if there were performance gains from using threading. 我想知道使用线程是否可以提高性能。 See my python code below: 请参阅下面的我的python代码:

import threading
import queue
import math
import time

results_total = []
results_calculation = []
results_threads = []

class MyThread(threading.Thread):
    def __init__(self, thread_id, q):
        threading.Thread.__init__(self)
        self.threadID = thread_id
        self.q = q

    def run(self):
        # print("Starting " + self.name)
        process_data(self.q)
        # print("Exiting " + self.name)


def process_data(q):
    while not exitFlag:
        queueLock.acquire()
        if not workQueue.empty():
            potentially_prime = True
            data = q.get()
            queueLock.release()
            # check if the data is a prime number
            # print("Testing {0} for primality.".format(data))
            for i in range(2, int(math.sqrt(data)+1)):
                if data % i == 0:
                    potentially_prime = False
                    break
            if potentially_prime is True:
                prime_numbers.append(data)
        else:
            queueLock.release()

for j in [1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 250, 500,
          750, 1000, 2500, 5000, 10000]:
    threads = []
    numberList = list(range(1, 10001))
    queueLock = threading.Lock()
    workQueue = queue.Queue()
    numberThreads = j
    prime_numbers = list()
    exitFlag = 0

    start_time_total = time.time()
    # Create new threads
    for threadID in range(0, numberThreads):
        thread = MyThread(threadID, workQueue)
        thread.start()
        threads.append(thread)

    # Fill the queue
    queueLock.acquire()
    # print("Filling the queue...")
    for number in numberList:
        workQueue.put(number)
    queueLock.release()
    # print("Queue filled...")
    start_time_calculation = time.time()
    # Wait for queue to empty
    while not workQueue.empty():
        pass

    # Notify threads it's time to exit
    exitFlag = 1

    # Wait for all threads to complete
    for t in threads:
        t.join()
    # print("Exiting Main Thread")
    # print(prime_numbers)
    end_time = time.time()
    results_total.append(
            "The test took {0} seconds for {1} threads.".format(
                end_time - start_time_total, j)
            )
    results_calculation.append(
            "The calculation took {0} seconds for {1} threads.".format(
                    end_time - start_time_calculation, j)
            )
    results_threads.append(
            "The thread setup time took {0} seconds for {1} threads.".format(
                    start_time_calculation - start_time_total, j)
            )
for result in results_total:
    print(result)
for result in results_calculation:
    print(result)
for result in results_threads:
    print(result)

This test finds the prime numbers between 1 and 10000. This set up is pretty much taken right from https://www.tutorialspoint.com/python3/python_multithreading.htm but instead of printing a simple string I ask the threads to find prime numbers. 此测试查找1到10000之间的质数。此设置几乎直接从https://www.tutorialspoint.com/python3/python_multithreading.htm进行,但是我没有打印简单的字符串,而是让线程查找质数。 This is not actually what my real world application is but I can't currently test the code I've written for the module. 这实际上不是我的实际应用程序,但是我目前无法测试为该模块编写的代码。 I thought this was a good test to measure the effect of additional threads. 我认为这是衡量额外线程效果的良好测试。 My real world application deals with talking to multiple serial devices. 我的现实世界应用程序涉及与多个串行设备通信。 I ran the test 5 times and averaged the times. 我进行了5次测试并取平均值。 Here are the results in a graph: 这是图形中的结果:

测试时间与线程数

My questions regarding threading and this test are as follows: 关于线程和此测试的我的问题如下:

  1. Is this test even a good representation of how threads should be used? 该测试甚至可以很好地说明应如何使用线程吗? This is not a server/client situation. 这不是服务器/客户端情况。 In terms of efficiency, is it better to avoid parallelism when you aren't serving clients or dealing with assignments/work being added to a queue? 在效率方面,当您不为客户服务或不处理分配到的工作/队列时,最好避免并行处理?

  2. If the answer to 1 is "No, this test isn't a place where one should use threads." 如果对1的回答是“否,那么此测试不是一个应该使用线程的地方”。 then when is? 那什么时候 Generally speaking. 一般来说。

  3. If the answer to 1 is "Yes, this is ok to use threads in that case.", why does adding threads end up taking longer and quickly reaches a plateau? 如果对1的回答是“是的,在这种情况下可以使用线程。”,为什么添加线程最终会花费更长的时间并很快达到稳定状态? Rather, why would one want to use threads as it takes many times longer than calculating it in a loop. 而是为什么要使用线程,因为它比在循环中计算线程要花费许多时间。

I notice that as the work to threads ratio gets closer to 1:1, the time taken to set up the threads becomes longer. 我注意到随着工作线程比例接近1:1,建立线程所花费的时间变得更长。 So is threading only useful where you create threads once and keep them alive as long as possible to handle requests that might enqueue faster than they can be calculated? 那么,线程仅在您一次创建线程并尽可能长地保持它们存活的情况下有用吗,以便处理可能比其计算速度更快的入队请求?

No, this is not a good place to use threads. 不,这不是使用线程的好地方。

Generally, you want to use threads where your code is IO-bound; 通常,您想使用代码受IO约束的线程; that is, it spends a significant amount of time waiting on input or output. 也就是说,它花费大量时间等待输入或输出。 An example might be downloading data from a list of URLs in parallel; 一个示例可能是从URL列表并行下载数据。 the code can start requesting the data from the next URL while still waiting for the previous one to return. 该代码可以开始从下一个URL请求数据,同时仍在等待前一个URL返回。

That's not the case here; 这里不是这种情况。 calculating primes is cpu-bound. 计算素数受CPU约束。

You're right to think that multithreading is a questionable move here for good reason. 您认为正确的理由是,多线程是一个值得怀疑的举措。 Multithreading, as it stands, is great and in the right applications can make a worlds difference in running times. 就目前而言,多线程非常有用,而且在正确的应用程序中,多线程可以在运行时间上产生巨大的变化。

However, on the other hand, it also adds additional complexity to any program that implements it (especially in python). 但是,另一方面,它也增加了实现它的任何程序的复杂性(尤其是在python中)。 There are also time penalties to consider when using multithreading, such as those that occur when doing context switches or the time it takes to actually create a thread. 使用多线程处理时还需要考虑时间的损失,例如在进行上下文切换时或实际创建线程所花费的时间。

These time penalties are negligent when your program has to process thousands upon thousands of resource intense tasks because the time you would save from having multithreading far outweighs the little bit of time it takes to get the threads ready. 当您的程序必须处理成千上万的资源密集型任务时,这些时间上的损失是疏忽大意的,因为多线程处理所节省的时间远远超过了准备线程所需的时间。 For your case though, I'm not sure your needs meet those requirements. 但是对于您的情况,我不确定您的需求是否满足那些要求。 I didn't look to deep into what type of objects you were processing but you stated they only took about 2 seconds, which isn't awful and you also said that you only have 6 items at a time to process. 我并没有考虑要处理的对象类型,但是您说它们只花了大约2秒钟,这并不可怕,并且您还说一次只能处理6个项目。 So on average we can expect the main part of your scrip to run for 12 seconds. 因此,平均而言,我们可以预期您的股票的主要部分将运行12秒钟。 In my opinion, that is not necessary for multithreading because it will take a second or two to get the threads ready and then pass the instructions to them, whereas in one thread your python script would already be well into processing its second object in that time. 我认为,这对于多线程来说不是必需的,因为要花一两秒钟的时间准备好线程,然后将指令传递给它们,而在一个线程中,您的python脚本在那时已经可以很好地处理其第二个对象了。 。

In short, I would save multithreading unless you need it. 简而言之,除非需要,否则我将节省多线程。 For example, huge datasets like those used for gene sequencing (big thing in Python) benefit greatly from it because multiple threads can help process these massive files concurrently. 例如,巨大的数据集(如用于基因测序的数据集(Python中的大事))从中受益匪浅,因为多个线程可以帮助同时处理这些海量文件。 In your case, it doesn't look like the ends justify the means. 在您的情况下,看起来目的并不意味着手段合理。 Hope this helps 希望这可以帮助

Threading in python is used to run multiple threads (tasks, function calls) at the same time. python中的线程用于同时运行多个线程(任务,函数调用)。 Note that this does not mean that they are executed on different CPUs. 注意,这并不意味着它们在不同的CPU上执行。 Python threads will NOT make your program faster if it already uses 100 % CPU time. 如果Python线程已经使用了100%的CPU时间,它将不会使您的程序更快。 In that case, you probably want to look into parallel programming. 在这种情况下,您可能希望研究并行编程。

from: https://en.wikibooks.org/wiki/Python_Programming/Threading 来自: https : //zh.wikibooks.org/wiki/Python_Programming/Threading

This is due to the mechanism called GIL. 这是由于称为GIL的机制引起的。 As Daniel pointed out, threads in python are only useful when you have IO-bound code. 正如Daniel指出的那样,python中的线程仅在具有IO绑定代码时才有用。 But then again, for IO-bound code it may be better to use lighter threads running on top of some event loop (using gevent, eventlet, asyncio or similar) as then you can easily run 100s (and more) of parallel operations with very little per thread overhead. 但是再说一次,对于IO绑定代码,最好在某些事件循环(使用gevent,eventlet,asyncio或类似事件)的顶部使用较轻的线程,因为这样一来,您可以很容易地运行100个(或更多)并行操作每个线程的开销很少。

If what you want is to use more than 1 core of CPU to speed up execution, take a look at multiprocessing module. 如果要使用1个以上的CPU内核来加快执行速度,请看一下多处理模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM