在R中进行并行处理时，我应该使用每个核心吗？

Question

I'm using R to convert some shapefiles. 我正在使用R转换一些shapefile。 R does this using just one core of my processor, and I want to speed it up using parallel processing. R使用我的处理器的一个核心来做到这一点，我想使用并行处理加速它。 So I've parallelized the process like this. 所以我将这个过程并行化了。

Given files which is a list of files to convert: 给定files是要转换的文件列表：

library(doMC)
registerDoMC()

foreach(f=files) %dopar% {
  # Code to do the conversion
}

This works just fine and it uses 2 cores. 这很好用，它使用2个核心。 According to the documentation for registerDoMC() , by default that function uses half the cores detected by the parallel package. 根据registerDoMC()的文档，默认情况下该函数使用parallel包检测到的一半核。

My question is why should I use half of the cores instead of all the cores? 我的问题是为什么我应该使用一半核心而不是所有核心？ (In this case, 4 cores.) By using the function registerDoMC(detectCores()) I can use all the cores on my system. （在这种情况下，4个内核。）通过使用函数registerDoMC(detectCores())我可以使用我系统上的所有内核。 What, if any, are the downsides to doing this? 如果有的话，这样做的缺点是什么？

Answer 1

Besides the question of scalability, there is a simple rule: Intel Hyperthreading cores do not help, under Windows at least. 除了可扩展性问题之外，还有一个简单的规则：至少在Windows下，英特尔超线程核心没有帮助。 So I get 8 with detectCores(), but I never found an improvement when going beyond 4 cores, even with MCMC parallel threads which in general scale perfectly. 因此我使用detectCores（）获得8，但是当超过4个核心时，我从未发现任何改进，即使MCMC并行线程通常也能完美扩展。

If someone has a case (under Windows) where there is such an improvement from Hyperthreading, please post it. 如果某人有一个案例（在Windows下），其中有超线程的改进，请发布。

Answer 2

Any time you do parallel processing there is some overhead (which can be nontrivial, especially with locking data structures and blocking calls). 任何时候进行并行处理都会产生一些开销（这可能很重要，特别是对于锁定数据结构和阻塞调用）。 For small batch jobs, running on a single core or two cores is much faster due to the fact that you're not paying that overhead. 对于小批量作业，由于您没有支付这笔开销，因此在单核或两核上运行要快得多。

I don't know the size of your job, but you should probably run some scaling experiments where you time your job on 1 processor, 2 processors, 4 processors, 8 processors, until you hit the max core count for your system (typically, you always double the processor count). 我不知道你的工作规模，但你应该进行一些缩放实验，你可以在1个处理器，2个处理器，4个处理器，8个处理器上完成工作，直到达到系统的最大核心数（通常，你总是加倍处理器数量）。 EDIT : It looks like you're only using 4 cores, so time with 1, 2, and 4. 编辑：看起来你只使用4个核心，所以时间分别为1,2和4。

Run timing results for ~32 trials for each core count and get a confidence interval, then you can say for certain whether running on all cores is right for you. 为每个核心计数运行约32次试验的计时结果并获得置信区间，然后您可以肯定地说，在所有核心上运行是否适合您。 If your job takes a long time, reduce the # of trials, all the way down to 5 or so, but remember that more trials will give you a higher degree of confidence. 如果你的工作需要很长时间，减少试验次数，一直到5次左右，但请记住，更多的试验会让你更有信心。

To elaborate: 详细说明：

Student's t-test : 学生的t检验：

The student's t-test essentially says "you calculated an average time for this core count, but that's not the true average. We can only get the true average if we had the average of an infinite number of data points. Your computed true average actually lies in some interval around your computed average" 学生的t检验基本上说“你计算了这个核心计数的平均时间，但这不是真正的平均值。如果我们得到无数个数据点的平均值，我们只能得到真正的平均值。实际计算的真实平均值位于计算平均值附近的某个区间“

The t-test for significance then basically compares the intervals around the true average for 2 datapoints and says whether they are significantly different or not. 然后，显着性的t检验基本上比较2个数据点的真实平均值周围的间隔，并说明它们是否显着不同。 So you may have one average time be less than another, but because the standard deviation is sufficiently high, we can't for certain say that it's actually less; 所以你可能有一个平均时间少于另一个，但由于标准偏差足够高，我们不能肯定地说它实际上更少; the true averages may be identical. 真实的平均值可能相同。

So, to compute this test for significance: 因此，要计算此测试的重要性：

Run your timing experiments 运行您的计时实验
For each core count: 对于每个核心数：
Compute your mean and standard deviation. 计算您的均值和标准差。 The standard deviation should be the population standard deviation , which is the square root of population variance Population variance is (1/N) * summation_for_all_data_points((datapoint_i - mean)^2) 标准差应为人口标准差 ，即人口方差的平方根。人口方差为（1 / N）* summation_for_all_data_points（（datapoint_i - mean）^ 2）

Now you will have a mean and standard deviations for each core count: (m_1, s_1), (m_2, s_2), etc. - For every pair of core counts: - Compute a t-value: t = (mean_1 - mean_2)/(s_1/ sqrt(#dataPoints)) 现在，每个核心数将有一个平均值和标准差：（m_1，s_1），（m_2，s_2）等。 - 对于每对核心数： - 计算t值：t =（mean_1 - mean_2） /（s_1 / sqrt（#dataPoints））

The example t value I showed tests whether the mean timing results for core count of 1 is significantly different than the timing results for core count of 2. You could test the other way around by saying: 我展示的示例t值测试核心计数为1的平均计时结果是否与核心计数为2的计时结果显着不同。您可以通过以下方式测试相反的方法：

t = (m_2 - m_1)/(s_2/ sqrt(#dataPoints)) t =（m_2 - m_1）/（s_2 / sqrt（#dataPoints））

After you computed these t-values, you can tell whether they're significant by looking at the critical value table . 计算这些t值后，您可以通过查看临界值表来判断它们是否显着。 Now, before you click that, you need to know about 2 more things: 现在，在您点击之前，您需要了解更多内容：

Degrees of Freedom 自由程度

This is related to the number of datapoints you have. 这与您拥有的数据点数有关。 The more datapoints you have, the smaller the interval around mean probably is. 您拥有的数据点越多，平均值的间隔可能越小。 Degrees of freedom kind of measures your computed mean's ability to move about, and it is #dataPoints - 1 ( v in the link I provided). 自由度衡量你的计算平均值的移动能力，它是#dataPoints - 1（我提供的链接中的v ）。

Alpha Α

Alpha is a probability threshold. Alpha是概率阈值。 In the Gaussian (Normal, bell-curved) distribution, alpha cuts the bell-curve on both the left and the right. 在高斯（正常，钟形弯曲）分布中，alpha在左侧和右侧切割钟形曲线。 Any probability in the middle of the cutoffs falls inside the threshold and is an insignificant result. 截止中间的任何概率都落在阈值内并且是无关紧要的结果。 A lower alpha makes it harder to get a significant result. 较低的alpha值会使得获得显着结果变得更加困难。 That is alpha = 0.01 means only the top 1% of probabilities are significant, and alpha = 0.05 means the top 5%. 即alpha = 0.01表示只有前1％的概率是显着的，alpha = 0.05表示前5％。 Most people use alpha = 0.05. 大多数人使用alpha = 0.05。

In the table I link to, 1-alpha determines the column you will go down looking for a critical value. 在我链接到的表格中，1-alpha确定您将要查找临界值的列。 (so alpha = 0.05 gives 0.95, or a 95% confidence interval), and v is your degrees of freedom, or row to look at. （所以alpha = 0.05给出0.95，或95％的置信区间）， v是你的自由度，或者是要看的行。

If your critical value is less than your computed t (absolute value), then your result is NOT significant. 如果您的临界值小于计算的t （绝对值），那么您的结果并不重要。 If the critical value is greater than your computed t (absolute value), then you have statistical significance. 如果临界值大于计算的t （绝对值），那么您具有统计显着性。

Edit: The Student's t-test assumes that variances and standard deviations are the same between the two means being compared. 编辑：学生的t检验假设两种方法之间的差异和标准差是相同的。 That is, it assumes the distribution of data points around the true mean is equal. 也就是说，它假设真实均值周围的数据点的分布是相等的。 If you DON'T want to make this assumption, then you're looking for Welch's t-test , which is slightly different. 如果你不想做出这个假设，那么你正在寻找韦尔奇的t检验，这个检验略有不同。 The wiki page has a good formula for computing t-values for this test. 维基页面有一个很好的公式来计算此测试的t值。

Answer 3

There is one situation you want to avoid: 您希望避免一种情况：

spreading a task over all N cores 在所有N个核心上传播任务
having each core work the task using something like OpenBLAS or MKL with all cores 让每个核心使用OpenBLAS或MKL等所有核心来完成任务

because now you have an N by N contention: each of the N task wants to farm its linear algebra work out to all N cores. 因为现在你有一个N乘N的争用：每个N任务都希望将其线性代数运用到所有N个核心。

Another (trivial) counter example is provided in a multiuser environment where not all M users on a machine can (simultaneously) farm out to N cores. 在多用户环境中提供了另一个（平凡的）计数器示例，其中并非机器上的所有M个用户都可以（同时）转出到N个核心。

Answer 4

Another reason not to use all the available cores is if your tasks use a lot memory and you don't have enough memory to support that number of workers. 不使用所有可用内核的另一个原因是，如果您的任务使用大量内存而您没有足够的内存来支持该数量的工作者。 Note that it can be tricky to determine how many workers can be supported by a given amount of memory, because doMC uses mclapply which forks the workers, so memory can be shared between the workers unless it is modified by one of them. 请注意，确定给定数量的内存可以支持多少工作人员可能很棘手，因为doMC使用mclapply工作者，因此除非被其中一个工作者修改，否则可以在工作程序之间共享内存。

From the answers to this question, it's pretty clear that it's not always easy to figure out the right number of workers to use. 从这个问题的答案来看，很明显，找出合适的工人数量并不总是很容易。 One could argue that there shouldn't be a default value, and the user should be forced to specify the number, but I'm not sure if I'd go that far. 有人可能会争辩说不应该有一个默认值，用户应该被强制指定数字，但我不确定我是否会走得那么远。 At any rate, there isn't anything very magical about using half the number of cores. 无论如何，使用一半核心并没有什么神奇之处。

Answer 5

Hm. 嗯。 I'm not a parallel processing expert, but I always thought the downside of using all your cores was that it made your machine sluggish when you tried to anything else. 我不是一个并行处理专家，但我一直认为使用所有内核的缺点是当你尝试其他任何东西时它会使你的机器变得迟钝。 I've had this happen to myself personally, when I've used all the cores, so my practice now is to use 7 of my 8 cores when I'm doing something parallel, leaving me one core to do other things. 当我使用所有内核时，我个人已经遇到过这种情况，所以我现在的做法是在我做一些并行的时候使用我的8个内核中的7个，留下一个核心去做其他事情。

在R中进行并行处理时，我应该使用每个核心吗？

问题描述

5 个解决方案

解决方案1
8 2013-08-17 17:21:36

解决方案2
6 已采纳 2013-08-17 16:14:34

Student's t-test : 学生的t检验：

Degrees of Freedom 自由程度

Alpha Α

解决方案3
4 2013-08-17 18:04:02

解决方案4
3 2013-08-18 02:26:33

解决方案5
2 2013-08-17 22:06:19

在R中进行并行处理时，我应该使用每个核心吗？

问题描述

5 个解决方案

解决方案1 8 2013-08-17 17:21:36

解决方案2 6 已采纳 2013-08-17 16:14:34

Student's t-test : 学生的t检验 ：

Degrees of Freedom 自由程度

Alpha Α

解决方案3 4 2013-08-17 18:04:02

解决方案4 3 2013-08-18 02:26:33

解决方案5 2 2013-08-17 22:06:19

解决方案1
8 2013-08-17 17:21:36

解决方案2
6 已采纳 2013-08-17 16:14:34

Student's t-test : 学生的t检验：

解决方案3
4 2013-08-17 18:04:02

解决方案4
3 2013-08-18 02:26:33

解决方案5
2 2013-08-17 22:06:19