Boost.Thread没有加速？

Question

I have a small program that implements a monte carlo simulation of BlackJack using various card counting strategies. 我有一个小程序，使用各种卡计数策略实现BlackJack的蒙特卡罗模拟。 My main function basically does this: 我的主要功能基本上是这样的：

int bankroll = 50000;
int hands = 100;
int tests = 10000;
Simulation::strategy = hi_lo;

for(int i = 0; i < simulations; ++i)
   runSimulation(bankroll, hands, tests, strategy);

The entire program run in a single thread on my machine takes about 10 seconds. 整个程序在我的机器上的单个线程中运行大约需要10秒钟。

I wanted to take advantage of the 3 cores my processor has so I decided to rewrite the program to simply execute the various strategies in separate threads like this: 我想利用我的处理器所拥有的3个内核，所以我决定重写程序，只需在不同的线程中执行各种策略，如下所示：

int bankroll = 50000;
int hands = 100;
int tests = 10000;
Simulation::strategy = hi_lo;
boost::thread threads[simulations];

for(int i = 0; i < simulations; ++i)
   threads[i] = boost::thread(boost::bind(runSimulation, bankroll, hands, tests, strategy));

for(int i = 0; i < simulations; ++i)
   threads[i].join();

However, when I ran this program, even though I got the same results it took around 24 seconds to complete. 但是，当我运行此程序时，即使我得到相同的结果，也需要大约24秒才能完成。 Did I miss something here? 我在这里错过了什么吗？

Answer 1

If the value of simulations is high, then you end up creating a lot of threads, and the overhead of doing so can end up destroying any possible performance gains. 如果simulations的值很高，那么最终会创建大量线程，这样做的开销最终会破坏任何可能的性能提升。

EDIT : One approach to this might be to just start three threads and let them each run 1/3 of the desired simulations. 编辑：一种方法可能是只启动三个线程，让它们各自运行所需模拟的1/3。 Alternatively, using a thread pool of some kind could also help. 或者，使用某种线程池也可以提供帮助。

Answer 2

This is a good candidate for a work queue with thread pool. 这是具有线程池的工作队列的良好候选者。 I have used Intel Threading Blocks (TBB) for such requirements. 我已经使用英特尔线程模块（TBB）来满足这些要求。 Use handcrafted thread pools for quick hacks too. 使用手工制作的线程池也可以快速入侵。 On Windows, the OS provides you with a nice thread pool backed work queue "QueueUserWorkItem()" 在Windows上，操作系统为您提供了一个很好的线程池支持的工作队列“QueueUserWorkItem（）”

Answer 3

Read these articles from Herb Sutter. 阅读Herb Sutter的这些文章。 You are probably victim of "false sharing". 你可能是“虚假分享”的受害者。

http://drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=214100002 http://drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=214100002

http://drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=217500206 http://drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=217500206

Answer 4

I'm late to this party, but wanted to note two things for others who come across this post: 我参加这个派对的时间已经很晚了，但是对于遇到这篇文章的人来说，我想注意两件事：

1) Definitely see the second Herb Sutter link that David points out (http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206). 1）绝对看到David指出的第二个Herb Sutter链接（http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206）。 It solved the problem that brought me to this question, outlining a struct data object wrapper that ensures separate parallel threads aren't competing for resources headquartered on the same memory cache-line (hardware controls will prevent multiple threads from accessing the same memory cache-line simultaneously). 它解决了引发我这个问题的问题，概述了一个结构数据对象包装器，它确保单独的并行线程不会竞争总部位于同一内存缓存线上的资源（硬件控制将阻止多个线程访问相同的内存缓存 - 同时排队）。

2) Re the original question, dlev points out a large part of the problem, but since it's a simulation I bet there's a deeper issue slowing things down. 2）重新提出原始问题，dlev指出了问题的很大一部分，但由于这是一个模拟，我打赌有一个更深层次的问题会减慢速度。 While none of your program's high-level variables are shared you probably have one critical system variable that's shared: the system-level "last random number" that's stored under-the-hood and used to create the next random number. 虽然您的程序的高级变量都没有共享，但您可能有一个共享的关键系统变量：系统级“最后一个随机数”，它存储在引擎盖下并用于创建下一个随机数。 You might even be initializing dedicated generator objects for each simulation, but if they're making calls to a function like rand() then they, and by extension their threads, are making repeated calls to the same shared system resource and subsequently blocking one another. 您甚至可能为每个模拟初始化专用的生成器对象，但如果他们正在调用rand（）这样的函数，那么他们以及他们的线程会重复调用相同的共享系统资源并随后相互阻塞。

Solutions to issue #2 would depend on the structure of the simulation program itself. 问题＃2的解决方案取决于模拟程序本身的结构。 For instance if calls to a random generator are fragmented then I'd probably batch into one upfront call which retrieves and stores what the simulation will need. 例如，如果对随机生成器的调用被分段，那么我可能会批量处理一个前期调用，该调用将检索并存储模拟所需的内容。 And this has me wondering now about more sophisticated approaches that'd deal with the underlying random generation shared-resource issue... 现在让我想知道更复杂的方法是否能解决潜在的随机生成共享资源问题......

Answer 5

I agree with dlev . 我同意dlev。 If your function runSimulation is not changing anything which will be required for the next call to "runSimulation" to work properly then you can do something like: 如果你的函数runSimulation没有改变下次调用“runSimulation”以正常工作所需的任何东西，那么你可以做类似的事情：

. 。 Divide "simulations" by 3. 将“模拟”除以3。

. 。 Now you will be having 3 counters "0 to simulation/3" "(simulation/3 + 1) to 2simulation/3" and "(2*simulation)/3 + 1 to simulation". 现在你将有3个计数器“0到模拟/ 3”“（模拟/ 3 + 1）到2模拟/ 3”和“（2 *模拟）/ 3 + 1到模拟”。

All these 3 counters can be used in three different threads simultaneously. 所有这3个计数器可以同时用于三个不同的线程。

**NOTE ::** Your requirement might not be suitable for this type of checkup at all in case you have to do shared data lockup and all **注意:: **您的要求可能根本不适合此类检查，以防您必须进行共享数据锁定

Boost.Thread没有加速？

问题描述

5 个解决方案

解决方案1
5 2011-05-20 03:34:04

解决方案2
2 2011-05-20 03:40:14

解决方案3
1 2011-05-20 14:42:40

解决方案4
0 2012-08-12 22:03:33

解决方案5
0 2011-05-20 03:51:40

Boost.Thread没有加速？

问题描述

5 个解决方案

解决方案1 5 2011-05-20 03:34:04

解决方案2 2 2011-05-20 03:40:14

解决方案3 1 2011-05-20 14:42:40

解决方案4 0 2012-08-12 22:03:33

解决方案5 0 2011-05-20 03:51:40

解决方案1
5 2011-05-20 03:34:04

解决方案2
2 2011-05-20 03:40:14

解决方案3
1 2011-05-20 14:42:40

解决方案4
0 2012-08-12 22:03:33

解决方案5
0 2011-05-20 03:51:40