简体   繁体   中英

Scala: Compare all elements in a huge list

Please advice on algorithm and implementation to compare elements in a very long list in Scala. I have a list with thousands of strings (from SQL) and I need to compare each list element with all other elements in this list.

As a result I need to get a list of tuples: List[(String, String, Boolean)] where first two elements are strings to match and third is a result.

For a list of N elements my algorithm so far is as follows:

  1. Take head of the list
  2. Compare head with remaining N-1 elements in the list
  3. Make new list from a tail of the old list and do all above work with this new list of N -1 elements:

Code:

   /**
   * Compare head of the list with each remaining element in this list
   */
  def cmpel(
    fst: String, lst: List[String],
    result: List[(String, String, Boolean)]): List[(String, String, Boolean)] = {

    lst match {
      case next :: tail => cmpel(fst, tail, (fst, next, fst == next) :: result)
      case nill => result.reverse
    }
  }

  /**
   * Compare list elements in all combinations of two
   */
  def cmpAll(lst: List[String],
    result: List[(String, String, Boolean)]): List[(String, String, Boolean)] = {
    lst match {
      case head :: tail => cmpAll(tail, result ++ cmpel(head, tail, List()))
      case nill => result
    }
  }

  def main(args: Array[String]): Unit = {
    val lst = List[String]("a", "b", "b", "a")
    println(cmpAll(lst, List()))
  }

Result:

 List((a,b,false), (a,b,false), (a,a,true), (b,b,true), (b,a,false), (b,a,false))

Thanks!

You can use the tails and flatMap methods to write a more concise and idiomatic solution:

list.tails.flatMap {
  case x :: rest => rest.map { y =>
    (x, y, x == y)
  }
  case _ => List()
}.toList

The tails method returns an iterator that iterates over repeated applications of .tail to the list. The first element in the iterator is the list itself, then the tail of the list, and so on, finally returning the empty list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM