简体   繁体   English

覆盖Scala中的默认Parallel Collections行为

[英]overriding default Parallel Collections behavior in scala

I have a large batched parallel computation that I use a parallel map for in scala. 我有大量的并行计算,在scala中使用了并行映射。 I have noticed that there appears to be a gradual downstepping of CPU usage as the workers finish. 我注意到,随着工作人员的完成,CPU使用率似乎逐渐下降。 It all comes down to a call to a call inside of the Map object 归结为对Map对象内部调用的调用

scala.collection.parallel.thresholdFromSize(length, tasksupport.parallelismLevel)

Looking at the code, I see this: 查看代码,我看到以下内容:

def thresholdFromSize(sz: Int, parallelismLevel: Int) = {
  val p = parallelismLevel
  if (p > 1) 1 + sz / (8 * p)
  else sz
}

My calculation works great on a large number of cores, and now I understand why.. 我的计算可以在大量内核上很好地工作,现在我明白了为什么。

thesholdFromSize(1000000,24) = 5209
thesholdFromSize(1000000,4) = 31251

If I have an array of length 1000000 on 24 CPU's it will partition all the way down to 5209 elements. 如果我在24个CPU上有一个长度为1000000的数组,它将一路划分为5209个元素。 If I pass that same array into the parallel collections on my 4 CPU machine, it will stop partitioning at 31251 elements. 如果将同一数组传递到4 CPU机器上的并行集合中,它将停止分区31251个元素。

It should be noted that the runtime of my calculations is not uniform. 应当指出,我的计算的运行时间不是统一的。 Runtime per unit can be as much as 0.1 seconds. 每个单元的运行时间可长达0.1秒。 At 31251 items, that's 3100 seconds, or 52 minutes of time where the other workers could be stepping in and grabbing work, but are not. 在31251个项目上,这是3100秒,即52分钟的时间,其他工人可能会介入并抓住工作,但不是。 I have observed exactly this behavior while monitoring CPU utilization during the parallel computation. 我在并行计算期间监视CPU利用率时已经观察到了这种行为。 Obviously I'd love to run on a large machine, but that's not always possible. 显然,我很想在大型计算机上运行,​​但这并不总是可能的。

My question is this: Is there any way to influence the parallel collections to give it a smaller threshold number that is more suited to my problem? 我的问题是:是否有任何方法可以影响并行集合,使其更小的阈值数更适合我的问题? The only thing I can think of is to make my own implementation of the class 'Map', but that seems like a very non-elegant solution. 我唯一能想到的就是让我自己实现“ Map”类,但这似乎是一个非常优雅的解决方案。

You want to read up on Configuring Scala parallel collections . 您想阅读《 配置Scala并行集合》 In particular, you probably need to implement a TaskSupport trait. 特别是,您可能需要实现TaskSupport特性。

I think all you need to do is something like this: 我认为您需要做的是这样的:

yourCollection.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(24))

The parallelism parameter defaults to the number of CPU cores that you have, but you can override it like above. parallelism参数默认为您拥有的CPU内核数,但是您可以像上面一样覆盖它。 This is shown in the source for ParIterableLike as well. 这也显示在ParIterableLike的源代码中。

0.1 second is large time enough to handle it separately. 0.1秒的时间足以单独处理它。 Wrap processing of each unit (or 10 units) in a separate Runnable and submit all of them to a FixedThreadPool. 将每个单元(或10个单元)的处理包装在单独的Runnable中,然后将它们全部提交给FixedThreadPool。 Another approach is to use ForkJoinPool - then it is easier to control the end of all computations. 另一种方法是使用ForkJoinPool-这样就可以更轻松地控制所有计算的结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM