简体   繁体   English

Scala循环选择:功能循环与传统for循环

[英]Scala looping choice : functional looping vs traditional for loop

Is looping over collections using functional constructs (map,foreach,flatMap,etc.) better? 是否使用功能构造(map,foreach,flatMap等)更好地循环集合? As a dummy problem consider I have a list of strings and I want to filter the strings by different criteria and then map over them to get some value. 作为一个虚拟问题,考虑我有一个字符串列表,我想按不同的标准过滤字符串,然后映射它们以获得一些价值。 Consider the code below: 请考虑以下代码:

val x1 = list.filter(criteria1).map(do_something)
val x2 = list.filter(criteria2).map(do_something)

Say I have 5 such different filter criteria then in this way I would be looping over the list (which may be large) 10 times (once with filter and once with map). 假设我有5个这样不同的过滤条件然后通过这种方式我将循环遍历列表(可能很大)10次(一次使用过滤器,一次使用地图)。

However I could group this all into one for loop and return/populate 5 new lists in a single iteration and then map over each one for a total of 6 loops instead of 10. 但是,我可以将所有这些组合成一个for循环并在单个迭代中返回/填充5个新列表,然后映射每个列表总共6个循环而不是10个循环。

for(i<- 0 to list.length-1){
  if(criteria1) //filter
  if(criteria2) //filter
}

This code may force me to use mutable lists but strictly from performance point of view, does using the functional constructs in such a situation make sense. 这段代码可能会迫使我使用可变列表,但从性能的角度来看,在这种情况下使用函数结构是否有意义。 Which would be a better approach? 哪种方法更好?

Note: The above code/problem was just to serve as an example, I hope it explains the kind of situation I'm referring to 注意:上面的代码/问题只是作为一个例子,我希望它能解释我所指的那种情况

If you're looking to filter and map, you can use withFilter instead of filter , which makes the filter lazy so that you're not traversing the list multiple times. 如果您要过滤和映射,可以使用withFilter而不是filter ,这会使过滤器变得懒惰,这样您就不会多次遍历列表。 for -expressions use withFilter for efficiency. for withFilter使用withFilter来提高效率。 You can also look into view s, which provide similar laziness for other operations. 您还可以查看view s,它为其他操作提供类似的懒惰。

It's not totally clear from the question what you're trying to do, but I think you want to output 5 new lists based on different filter and map operations. 从问题中你想要做什么并不完全清楚,但我认为你想根据不同的过滤器和地图操作输出5个新列表。 Using loops and mutable builders like you suggest is a reasonable approach if performance is paramount, and this is how many of the collection methods are programmed (check the source code). 如果性能至关重要,那么使用像你建议的循环和可变构建器是一种合理的方法,这就是编程了多少个集合方法(检查源代码)。 Not sure why you think you'd need to filter into 5 lists and then traverse each one to do the mapping - why not just do the map at the same time as you're builiding the new lists, by applying the function to each element? 不确定为什么你认为你需要过滤到5个列表然后遍历每个列表来进行映射 - 为什么不在构建新列表的同时做地图,方法是将函数应用于每个元素? eg 例如

  def split[T](xs: Seq[T])(ops: (T => Boolean, T => T)*): Seq[Seq[T]] = {
    val (filters, maps) = ops.unzip
    val buffers = IndexedSeq.fill(ops.size)(ListBuffer.empty[T])
    for {
      x <- xs
      i <- buffers.indices
      if filters(i)(x)
    } buffers(i) += maps(i)(x)  
    buffers.map(_.toSeq)  // return to immutable-land
  }

  // demo: 
  val res = split(1 to 10)(
    (_ < 5, _ * 100),     // multiply everything under 5 by 100
    (_ % 2 == 1, 0 - _),  // negate all odd numbers
    (_ % 3 == 0, _ + 5)   // add 5 to numbers divisible by 3
  )

  println(res) 
  //Vector(List(100, 200, 300, 400), List(-1, -3, -5, -7, -9), List(8, 11, 14))

I don't think there's a built-in method to do what (I think) you want to do. 我不认为有一种内置方法可以做你想做的事情(我想)。 Note that you CAN define a builder method without mutable state if you use recursion, but this is once place where local mutable state is more concise / readable. 请注意,如果使用递归,则可以定义不具有可变状态的构建器方法,但这是本地可变状态更简洁/可读的地方。

Your question really comes down to performance, and it's easy to prematurely optimize. 您的问题实际上取决于性能,并且很容易过早地进行优化。 I'd recommend you only do the above if you do have a genuine performance problem. 如果您确实遇到真正的性能问题,我建议您只执行上述操作。 If idiomatic / simple is not good enough, THEN you might be able to tweak things to optimize your particular use-case. 如果惯用/简单不够好,那么你可以调整一些东西来优化你的特定用例。 It just comes down to the fact that there can't be built-in optimized methods for everything you might want to do. 它归结为这样一个事实:对于您可能想要做的所有事情,不能有内置的优化方法。

You can also do it this way: 你也可以这样做:

val x1 = for(x <- list if criteria1) yield do_something(x)

The compiler actually transforms this to val x1 = list.filter(criteria1).map(do_something) just like you had above. 编译器实际上将它转换为val x1 = list.filter(criteria1).map(do_something) ,就像你上面一样。 The for comprehension is just some nice syntactic sugar that lets you turn complex aggregates of operations on some sequence into something more readable. for comprehension只是一些很好的语法糖,它允许你将一些序列上的复杂操作聚合转换成更具可读性的东西。 You can read the relevant chapter in Odersky's book for more details. 您可以阅读Odersky的书中的相关章节以获取更多详细信息。

Back to your question though. 回到你的问题。 If you're trying to produce 5 different lists based on different filters and maps, maybe you should make a list of lists instead. 如果您尝试根据不同的过滤器和地图生成5个不同的列表,则可能应该列出列表。 You can use for comprehensions to loop over the input list for each pair of transformation functions. 您可以使用for compreheres循环每对转换函数的输入列表。

That would help you make the code a bit simpler, but it won't actually reduce the algorithmic complexity of the problem (ie you'd still iterate over the list 5 times). 这将有助于您使代码更简单,但它实际上不会降低问题的算法复杂性(即您仍然在列表上迭代5次)。

In this situation, I think you're right in that using an imperative-style loop would be much more efficient. 在这种情况下,我认为你是正确的,使用命令式循环会更有效率。 The recommended data structure for building a list is the ListBuffer because you can add an element to either end in constant time—and then when you're done building the list you can turn it into an immutable list (also in constant time). 用于构建列表的推荐数据结构是ListBuffer因为您可以在常量时间内将元素添加到任一端 - 然后当您构建完列表时,可以将其转换为不可变列表(也是在常量时间内)。 There's also a small section on using ListBuffer in Odersky's book. 在Odersky的书中还有一小段关于使用ListBuffer的内容。 Here's how I'd do it: 这是我如何做到的:

import scala.collection.mutable.ListBuffer

val b1 = new ListBuffer[Int]
val b2 = new ListBuffer[Int]
// ... b3, b4, b5

for (x <- list) {
  val y = do_something(x)
  if (criteria1(x)) b1 += y
  if (criteria2(x)) b2 += y
  // ... criteria3, criteria4, criteria5
}

val x1 = b1.toList
val x2 = b2.toList
// ... x3, x4, x5

Since it's using a mutable ListBuffer this code isn't very "pure" anymore—but it might be worth the speedup for long lists since you no longer have to traverse the whole list 5 times. 因为它使用了一个可变的ListBuffer这段代码不再是“纯粹的” - 但是由于你不再需要遍历整个列表5次,因此可能值得加速长列表。

I wouldn't really say that one method is much better than the other in this case. 在这种情况下,我不会说一种方法比另一种方法好得多。 The ListBuffer way uses mutation, which is faster but might make the code harder to maintain. ListBuffer方式使用突变,这种方法更快但可能会使代码难以维护。 In contrast, the more functional version just uses repeated calls to filter and map on the original list, which is probably easier to read (assuming the reader is familiar with idiomatic Scala of course) and easier to maintain, but might run a little slower. 相比之下,功能越多的版本只是使用重复调用来filtermap原始列表,这可能更容易阅读(假设读者当然熟悉惯用的Scala)并且更容易维护,但可能运行速度稍慢。 The choice really depends on what your goal is. 选择实际上取决于您的目标。

I'm not so sure that going over the list several times is going to be slower. 我不太确定多次查看列表会变慢。 You have to build your m lists of length k out of a list of length n . 您必须从长度为n的列表中构建长度为k m列表。 So you'll have to do m*k comparisons on each of n either way. 所以你必须对n中的每一个进行m*k比较。 If it is slower, then it's by some constant factor. 如果它比较慢,那么它是由一些常数因素决定的。 I don't know if that factor is small or large. 我不知道这个因素是小还是大。

If you really want to do it in one pass, it's definitely possible. 如果你真的想一次性完成,那绝对是可能的。 Any operation on a list can be done in a single pass with a fold. 列表上的任何操作都可以通过折叠一次完成。 It can be a bit complicated, and highlights why it might not be any faster. 它可能有点复杂,并强调为什么它可能不会更快。 It's certainly harder to read: 这当然更难阅读:

val cs = List((criteria1, f1), (criteria2, f2))
val xs = list.foldRight(cs.map(_ => Nil)) { (x, rs) =>
  (cs zip rs).map { case ((p, f), r) =>
    if (p(x)) f(x) :: r else r
  }
}

You may need some more type annotation than I've given here. 您可能需要比我在此处给出的更多类型注释。

You can also use laziness to your advantage here: 你也可以在这里使用懒惰:

list.toStream.filter(???).map(???)

This traverses the list zero times. 这会遍历列表次。 The elements don't actually get filtered and mapped until you request the elements of the result. 在您请求结果元素之前,元素实际上不会被过滤和映射。 Obviously use your real code instead of ??? 显然使用你的真实代码而不是??? .

Is the iterating part really relevant for your performance? 迭代部分与您的表现真的相关吗? In most cases I doubt that. 在大多数情况下,我怀疑。 Only if this is the case the single for loop is going to be faster. 只有在这种情况下,单个for循环才会更快。

But if you have to use mutable datatypes for it chances are it is now much harder to run on multiple cores, and if this is really in a performance critical situation the gain you get from running this on 8-800 cores will be huge against the little you gain from saving one loop iteration. 但是,如果你必须使用可变数据类型,那么现在很难在多个内核上运行,如果这确实处于性能危急情况,那么在8-800内核上运行它所获得的收益将是巨大的从保存一个循环迭代中获得的收益很少。

Note that the for comprehension often isn't optimal for performance anyway since it might have to create lots of closure instances. 请注意,for comprehension通常不是最佳性能,因为它可能需要创建大量的闭包实例。

if I understand correctly, you want to make multiple lists out of a single list depended on different criteria. 如果我理解正确,您希望根据不同的标准从单个列表中创建多个列表。 I think groupBy would serve the purpose~ 我认为groupBy会达到目的〜

val grouped = list.groupBy{ item => {
    val c1 = criteria1(item)
    val c2 = criteria2(item)
    if (c1 && c2) 12
    else if (c1) 1
    else if (c2) 2
    else 0
}}
val excluded0 = grouped - 0
val result = excluded0 mapValues do_something
val x1 = result(1) ++ result(12)
val x2 = result(2) ++ result(12)

as Apocalisp mentioned, you can also take advantage of laziness by using view and force like: 正如Apocalisp所提到的,你也可以通过使用viewforce来利用懒惰:

val grouped = list.view.groupBy{ ...
...
val x1 = (result(1) ++ result(12)).force

As it has not been mentioned before, you may also want to consider that the combination of filter and map is available in a shorter form via collect . 如前所述,您可能还需要考虑通过collect以较短的形式提供filtermap的组合。 So you can do something like this: 所以你可以这样做:

list.collect {
  case x if criteria1(x) => ...
  case x if criteria2(x) => ....
  case _ => ...
}

However, this is a slightly changed semantics for list elements that satisfy both criteria1 and criteria2 . 但是,对于满足criteria1criteria2列表元素,这是一个稍微改变的语义。 Similarly to what Chris proposed, you could create a first case x if criteria1(x) && criteria2(x) , but that won't scale to multiple such criteria of course. 与Chris提出的方法类似,您可以创建第一个case x if criteria1(x) && criteria2(x) ,但当然不会扩展到多个此类条件。

A point left unclear by you though is if you want to construct actual result lists (as in your first example), or just execute some side-effects (as in your second example). 您不清楚的一点是,如果您想构建实际结果列表(如第一个示例中所示),或者只是执行一些副作用(如第二个示例中所示)。 The latter could also be achieved by a slightly different approach, as illustrated by the following example: 后者也可以通过稍微不同的方法来实现,如以下示例所示:

// A list of criteria and corresponding effects
val criteriaEffects = List( 
  ( (x : Int) => x == 0, (x : Int) => { println("Effect 1: " + x) } ),
  ( (x : Int) => x == 1, (x : Int) => { println("Effect 2: " + x) } ) )

// now run through your values list
List(0,1,2).map(x => criteriaEffects.map( p => if (p._1(x)) p._2(x) ) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM