简体   繁体   English

C ++的简单多线程混淆

[英]Simple multi-threading confusion for C++

I am developing a C++ application in Qt . 我正在Qt开发一个C++应用程序。 I have a very basic doubt, please forgive me if this is too stupid... 我有一个非常基本的疑问,请原谅我,如果这太愚蠢了......

How many threads should I create to divide a task amongst them for minimum time? 我应该创建多少个线程来将任务划分到最短时间内?

I am asking this because my laptop is 3rd gen i5 processor (3210m). 我问这个是因为我的笔记本电脑是第三代i5处理器(3210米)。 So since it is dual core & NO_OF_PROCESSORS environment variable is showing me 4 . 因此它是双核NO_OF_PROCESSORS 环境变量显示我4 I had read in an article that dynamic memory for an application is only available for that processor which launched that application. 我在一篇文章中读到,应用程序的动态内存仅适用于启动该应用程序的处理器。 So should I create only 1 thread (since env variable says 4 processors) or 2 threads (since my processor is dual core & env variable might be suggesting the no of cores) or 4 threads (if that article was wrong)? 那么我应该只创建一个线程(因为env变量说4个处理器)或2个线程(因为我的处理器是双核和env变量可能暗示核心数)或4个线程(如果那篇文章错误)? Please forgive me since I am a beginner level programmer trying to learn Qt. 请原谅我,因为我是初学级程序员,试图学习Qt。 Thank You :) 谢谢 :)

Although hyperthreading is somewhat of a lie (you're told that you have 4 cores, but you really only have 2 cores, and another two that only run on what resources the former two don't use, if there's such a thing), the correct thing to do is still to use as many threads as NO_OF_PROCESSORS tells you . 虽然超线程有点谎言(你被告知你有4个内核,但你真的只有2个内核,另外两个只能运行前两个不使用的资源,如果有这样的话),正确的做法是使用NO_OF_PROCESSORS告诉你的尽可能多的线程

Note that Intel isn't the only one lying to you, it's even worse on recent AMD processors where you have 6 alleged "real" cores, but in reality only 4 of them, with resources shared among them. 请注意,英特尔并不是唯一一个对你说谎的人,在最近的AMD处理器中,你有6个所谓的“真正”内核更糟糕,但实际上只有4个,其中有资源共享。

However, most of the time, it just more or less works out. 但是,大多数时候,它或多或少都有效。 Even in absence of explicitly blocking a thread (on a wait function or a blocking read), there's always a point where a core is stalled, for example in accessing memory due to a cache miss, which gives away resources that can be used by the hyperthreaded core. 即使没有明确地阻塞线程(在等待函数或阻塞读取上),也总是存在一个核心停滞的点,例如由于高速缓存未命中而访问内存,这会释放出可供资源使用的资源。超线程核心。

Therefore, if you have a lot of work to do, and you can parallelize it nicely, you should really have as many workers as there are advertized cores (whether they're "real" or "hyper"). 因此,如果你有很多工作要做,并且可以很好地并行化,那么你应该拥有与广告核心一样多的工作者(无论它们是“真实的”还是“超级的”)。 This way, you make maximum use of the available processor resources. 这样,您可以最大限度地利用可用的处理器资源。

Ideally, one would create worker threads early at application startup, and have a task queue to hand tasks to workers. 理想情况下,可以在应用程序启动时尽早创建工作线程,并有一个任务队列将任务交给工作人员。 Since synchronization is often non-neglegible, the task queue should be rather "coarse". 由于同步通常是不可忽略的,因此任务队列应该相当“粗略”。 There is a tradeoff in maximum core usage and synchronization overhead. 在最大核心使用和同步开销方面存在权衡。

For example, if you have 10 million elements in an array to process, you might push tasks that refer to 100,000 or 200,000 consecutive elements (you will not want to push 10 million tasks!). 例如,如果要处理的数组中有1000万个元素,则可以推送引用100,000或200,000个连续元素的任务(您希望推送1000万个任务!)。 That way, you make sure that no cores stay idle on the average (if one finishes earlier, it pulls another task instead of doing nothing) and you only have a hundred or so synchronizations, the overhead of which is more or less neglegible. 这样,你可以确保平均没有核心保持空闲状态(如果一个核心提前完成,它会拉动另一个任务而不是什么都不做)而你只有一百个左右的同步,其开销或多或少都是可忽略不计的。

If tasks involve file/socket reads or other things that can block for indefinite time, spawning another 1-2 threads is often no mistake (takes a bit of experimentation). 如果任务涉及文件/套接字读取或其他可能无限期阻塞的事情,则产生另外1-2个线程通常没有错误(需要一些实验)。

This totally depends on your workload, if you have a workload which is very cpu intensive you should stay closer to the number of threads your cpu has(4 in your case - 2 core * 2 for hyperthreading). 这完全取决于您的工作负载,如果您的工作负载非常大,那么您应该更接近您的cpu所具有的线程数(在您的情况下为4个 - 2个内核* 2用于超线程)。 A small oversubscription might be also be ok, as that can compensate for times where one of your threads waits for a lock or something else. 一个小的超额订阅也可能是正常的,因为这可以补偿你的一个线程等待锁或其他东西的时间。
On the other side, if your application is not cpu dependent and is mostly waiting, you can even create more threads than your cpu count. 另一方面,如果您的应用程序不依赖于CPU并且大部分都在等待,您甚至可以创建比您的cpu计数更多的线程。 You should however notice that thread creation can be quite an overhead. 但是你应该注意到线程创建可能是一个很大的开销。 The only solution is to measure were your bottleneck is and optimize in that direction. 唯一的解决方案是衡量您的瓶颈是否在这个方向上进行优化。

Also note that if you are using c++11 you can use std::thread::hardware_concurrency to get a portable way to determine the number of cpu cores you have. 另请注意,如果您使用的是c ++ 11,则可以使用std::thread::hardware_concurrency来获取一种可移植的方法来确定您拥有的cpu核心数。

Concerning your question about dynamic memory, you must have misunderstood something there.Generally all threads you create can access the memory you created in your application. 关于动态内存的问题,你必须在那里误解了一些东西。一般来说,你创建的所有线程都可以访问你在应用程序中创建的内存。 In addition, this has nothing to do with C++ and is out of the scope of the C++ standard. 此外,这与C ++无关,并且超出了C ++标准的范围。

NO_OF_PROCESSORS shows 4 because your CPU has Hyper-threading. NO_OF_PROCESSORS显示4,因为您的CPU具有超线程。 Hyper-threading is the Intel trademark for tech that enables a single core to execute 2 threads of the same application more or less at the same time. 超线程是英特尔的技术商标,它使单个内核能够或多或少地同时执行同一应用程序的2个线程。 It work as long as eg one thread is fetching data and the other one accessing the ALU. 只要例如一个线程正在获取数据而另一个线程正在访问ALU,它就可以工作。 If both need the same resource and instructions can't be reordered, one thread will stall. 如果两者都需要相同的资源并且指令无法重新排序,则一个线程将停止。 This is the reason you see 4 cores, even though you have 2. 这就是你看到4个核心的原因,即使你有2个核心。

That dynamic memory is only available to one of the Cores is IMO not quite right, but register contents and sometimes cache content is. 那个动态内存只适用于其中一个Core是IMO不太正确,但注册内容有时缓存内容是。 Everything that resides in the RAM should be available to all CPUs. 驻留在RAM中的所有内容都应该可供所有CPU使用。

More threads than CPUs can help, depending on how you operating systems scheduler works / how you access data etc. To find that you'll have to benchmark your code. 比CPU更多的线程可以提供帮助,具体取决于操作系统调度程序的工作方式/访问数据的方式等。要找到您必须对代码进行基准测试。 Everything else will just be guesswork. 其他一切都只是猜测。

Apart from that, if you're trying to learn Qt, this is maybe not the right thing to worry about... 除此之外,如果你想学习Qt,这可能不是正确的担心...

Edit: 编辑:

Answering your question: We can't really tell you how much slower/faster your program will run if you increase the number of threads. 回答你的问题:如果增加线程数,我们无法告诉你程序运行速度会慢多少。 Depending on what you are doing this will change. 根据你正在做的事情,这将改变。 If you are eg waiting for responses from the network you could increase the number of threads much more. 如果您正在等待来自网络的响应,则可以更多地增加线程数。 If your threads are all using the same hardware 4 threads might not perform better than 1. The best way is to simply benchmark your code. 如果您的线程都使用相同的硬件,则4个线程的性能可能不会超过1.最好的方法是简单地对代码进行基准测试。

In an ideal world, if you are 'just' crunching numbers should not make a difference if you have 4 or 8 threads running, the net time should be the same (neglecting time for context switches etc.) just the response time will differ. 在一个理想的世界中,如果你只是'处理',如果你有4个或8个线程运行,那么数字应该没有区别,净时间应该是相同的(忽略上下文切换的时间等),只是响应时间会有所不同。 The thing is that nothing is ideal, we have caches, your CPUs all access the same memory over the same bus, so in the end they compete for access to resources. 事情是没有什么是理想的,我们有缓存,你的CPU都通过同一总线访问相同的内存,所以最终他们争夺资源的访问权限。 Then you also have an operating system that might or might not schedule a thread/process at a given time. 然后,您还有一个操作系统可能会或可能不会在给定时间安排线程/进程。

You also asked for an Explanation of synchronization overhead: If all your threads access the same data structures, you will have to do some locking etc. so that no thread accesses the data in an invalid state while it is being updated. 您还要求解释同步开销:如果所有线程都访问相同的数据结构,则必须执行某些锁定等操作,以便在更新时没有线程访问处于无效状态的数据。

Assume you have two threads, both doing the same thing: 假设你有两个线程,都做同样的事情:

int sum = 0; // global variable

thread() {
    int i = sum;
    i += 1;
    sum = i;
}

If you start two threads doing this at the same time, you can not reliably predict the output: It might happen like this: 如果同时启动两个线程执行此操作,则无法可靠地预测输出:可能会发生如下情况:

THREAD A : i = sum; // i = 0
           i += 1;  // i = 1
**context switch**
THREAD B : i = sum; // i = 0
           i += 1;  // i = 1
           sum = i; // sum = 1
**context switch**
THREAD A : sum = i; // sum = 1

In the end sum is 1 , not 2 even though you started the thread twice. 最后sum1 ,而不是2即使你开始两次线程。 To avoid this you have to synchronize access to sum , the shared data. 为避免这种情况,您必须同步对sum (共享数据)的访问。 Normally you would do this by blocking access to sum as long as needed. 通常,只要需要阻止访问sum ,就可以这样做。 Synchronization overhead is the time that threads would be waiting until the resource is unlocked again, doing nothing. 同步开销是线程在资源再次解锁之前等待的时间,什么都不做。

If you have discrete work packages for each thread and no shared resources you should have no synchronization overhead. 如果每个线程都有离散的工作包而没有共享资源,则应该没有同步开销。

The easiest way to get started with dividing work among threads in Qt is to use the Qt Concurrent framework. 开始在Qt中的线程之间划分工作的最简单方法是使用Qt Concurrent框架。 Example: You have some operation that you want to perform on every item in a QList (pretty common). 示例:您要对QList中的每个项执行一些操作(非常常见)。

void operation( ItemType & item )
{
  // do work on item, changing it in place
}

QList<ItemType> seq;  // populate your list

// apply operation to every member of seq
QFuture<void> future = QtConcurrent::map( seq, operation );
// if you want to wait until all operations are complete before you move on...
future.waitForFinished();

Qt handles the threading automatically...no need to worry about it. Qt自动处理线程......无需担心它。 The QFuture documenation describes how you can handle the map completion asymmetrically with signals and slots if you need to do that. QFuture文档描述了如何在需要时使用信号和插槽非对称地处理map完成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM