简体   繁体   中英

Scala looping choice : functional looping vs traditional for loop

Is looping over collections using functional constructs (map,foreach,flatMap,etc.) better? As a dummy problem consider I have a list of strings and I want to filter the strings by different criteria and then map over them to get some value. Consider the code below:

val x1 = list.filter(criteria1).map(do_something)
val x2 = list.filter(criteria2).map(do_something)

Say I have 5 such different filter criteria then in this way I would be looping over the list (which may be large) 10 times (once with filter and once with map).

However I could group this all into one for loop and return/populate 5 new lists in a single iteration and then map over each one for a total of 6 loops instead of 10.

for(i<- 0 to list.length-1){
  if(criteria1) //filter
  if(criteria2) //filter
}

This code may force me to use mutable lists but strictly from performance point of view, does using the functional constructs in such a situation make sense. Which would be a better approach?

Note: The above code/problem was just to serve as an example, I hope it explains the kind of situation I'm referring to

If you're looking to filter and map, you can use withFilter instead of filter , which makes the filter lazy so that you're not traversing the list multiple times. for -expressions use withFilter for efficiency. You can also look into view s, which provide similar laziness for other operations.

It's not totally clear from the question what you're trying to do, but I think you want to output 5 new lists based on different filter and map operations. Using loops and mutable builders like you suggest is a reasonable approach if performance is paramount, and this is how many of the collection methods are programmed (check the source code). Not sure why you think you'd need to filter into 5 lists and then traverse each one to do the mapping - why not just do the map at the same time as you're builiding the new lists, by applying the function to each element? eg

  def split[T](xs: Seq[T])(ops: (T => Boolean, T => T)*): Seq[Seq[T]] = {
    val (filters, maps) = ops.unzip
    val buffers = IndexedSeq.fill(ops.size)(ListBuffer.empty[T])
    for {
      x <- xs
      i <- buffers.indices
      if filters(i)(x)
    } buffers(i) += maps(i)(x)  
    buffers.map(_.toSeq)  // return to immutable-land
  }

  // demo: 
  val res = split(1 to 10)(
    (_ < 5, _ * 100),     // multiply everything under 5 by 100
    (_ % 2 == 1, 0 - _),  // negate all odd numbers
    (_ % 3 == 0, _ + 5)   // add 5 to numbers divisible by 3
  )

  println(res) 
  //Vector(List(100, 200, 300, 400), List(-1, -3, -5, -7, -9), List(8, 11, 14))

I don't think there's a built-in method to do what (I think) you want to do. Note that you CAN define a builder method without mutable state if you use recursion, but this is once place where local mutable state is more concise / readable.

Your question really comes down to performance, and it's easy to prematurely optimize. I'd recommend you only do the above if you do have a genuine performance problem. If idiomatic / simple is not good enough, THEN you might be able to tweak things to optimize your particular use-case. It just comes down to the fact that there can't be built-in optimized methods for everything you might want to do.

You can also do it this way:

val x1 = for(x <- list if criteria1) yield do_something(x)

The compiler actually transforms this to val x1 = list.filter(criteria1).map(do_something) just like you had above. The for comprehension is just some nice syntactic sugar that lets you turn complex aggregates of operations on some sequence into something more readable. You can read the relevant chapter in Odersky's book for more details.

Back to your question though. If you're trying to produce 5 different lists based on different filters and maps, maybe you should make a list of lists instead. You can use for comprehensions to loop over the input list for each pair of transformation functions.

That would help you make the code a bit simpler, but it won't actually reduce the algorithmic complexity of the problem (ie you'd still iterate over the list 5 times).

In this situation, I think you're right in that using an imperative-style loop would be much more efficient. The recommended data structure for building a list is the ListBuffer because you can add an element to either end in constant time—and then when you're done building the list you can turn it into an immutable list (also in constant time). There's also a small section on using ListBuffer in Odersky's book. Here's how I'd do it:

import scala.collection.mutable.ListBuffer

val b1 = new ListBuffer[Int]
val b2 = new ListBuffer[Int]
// ... b3, b4, b5

for (x <- list) {
  val y = do_something(x)
  if (criteria1(x)) b1 += y
  if (criteria2(x)) b2 += y
  // ... criteria3, criteria4, criteria5
}

val x1 = b1.toList
val x2 = b2.toList
// ... x3, x4, x5

Since it's using a mutable ListBuffer this code isn't very "pure" anymore—but it might be worth the speedup for long lists since you no longer have to traverse the whole list 5 times.

I wouldn't really say that one method is much better than the other in this case. The ListBuffer way uses mutation, which is faster but might make the code harder to maintain. In contrast, the more functional version just uses repeated calls to filter and map on the original list, which is probably easier to read (assuming the reader is familiar with idiomatic Scala of course) and easier to maintain, but might run a little slower. The choice really depends on what your goal is.

I'm not so sure that going over the list several times is going to be slower. You have to build your m lists of length k out of a list of length n . So you'll have to do m*k comparisons on each of n either way. If it is slower, then it's by some constant factor. I don't know if that factor is small or large.

If you really want to do it in one pass, it's definitely possible. Any operation on a list can be done in a single pass with a fold. It can be a bit complicated, and highlights why it might not be any faster. It's certainly harder to read:

val cs = List((criteria1, f1), (criteria2, f2))
val xs = list.foldRight(cs.map(_ => Nil)) { (x, rs) =>
  (cs zip rs).map { case ((p, f), r) =>
    if (p(x)) f(x) :: r else r
  }
}

You may need some more type annotation than I've given here.

You can also use laziness to your advantage here:

list.toStream.filter(???).map(???)

This traverses the list zero times. The elements don't actually get filtered and mapped until you request the elements of the result. Obviously use your real code instead of ??? .

Is the iterating part really relevant for your performance? In most cases I doubt that. Only if this is the case the single for loop is going to be faster.

But if you have to use mutable datatypes for it chances are it is now much harder to run on multiple cores, and if this is really in a performance critical situation the gain you get from running this on 8-800 cores will be huge against the little you gain from saving one loop iteration.

Note that the for comprehension often isn't optimal for performance anyway since it might have to create lots of closure instances.

if I understand correctly, you want to make multiple lists out of a single list depended on different criteria. I think groupBy would serve the purpose~

val grouped = list.groupBy{ item => {
    val c1 = criteria1(item)
    val c2 = criteria2(item)
    if (c1 && c2) 12
    else if (c1) 1
    else if (c2) 2
    else 0
}}
val excluded0 = grouped - 0
val result = excluded0 mapValues do_something
val x1 = result(1) ++ result(12)
val x2 = result(2) ++ result(12)

as Apocalisp mentioned, you can also take advantage of laziness by using view and force like:

val grouped = list.view.groupBy{ ...
...
val x1 = (result(1) ++ result(12)).force

As it has not been mentioned before, you may also want to consider that the combination of filter and map is available in a shorter form via collect . So you can do something like this:

list.collect {
  case x if criteria1(x) => ...
  case x if criteria2(x) => ....
  case _ => ...
}

However, this is a slightly changed semantics for list elements that satisfy both criteria1 and criteria2 . Similarly to what Chris proposed, you could create a first case x if criteria1(x) && criteria2(x) , but that won't scale to multiple such criteria of course.

A point left unclear by you though is if you want to construct actual result lists (as in your first example), or just execute some side-effects (as in your second example). The latter could also be achieved by a slightly different approach, as illustrated by the following example:

// A list of criteria and corresponding effects
val criteriaEffects = List( 
  ( (x : Int) => x == 0, (x : Int) => { println("Effect 1: " + x) } ),
  ( (x : Int) => x == 1, (x : Int) => { println("Effect 2: " + x) } ) )

// now run through your values list
List(0,1,2).map(x => criteriaEffects.map( p => if (p._1(x)) p._2(x) ) )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM