简体   繁体   中英

Scala sort list of tuples by attribute

I was wondering what's an easy way to sort a List[(String, String)] by the first string in the tuple alphabetically while removing all duplicates for the second string in the tuple.

Thanks for the suggestions!

We could run a performance test?

val items = List("a" -> 0, "b" -> 1, "c" -> 0, "d" -> 0, "e" -> 1)
items.groupBy(_._2).toList
     .sortBy(_._2.head._1)(new Ordering[String]() { 
         override def compare(x: String, y: String) = { -x.compareTo(y) } 
     })
     .map(e => (e._2.head._1 -> e._1))

Result:

List((b,1), (a,0))

For the curious, the answer linked in the comment still works.

If you already have TreeMap, then groupBy will build TreeMaps. That line is similar to idonnie's, except that it's unnecessary to sort again.

TreeMap(data: _*) groupBy (_._2) map (p => (p._2.head._1, p._1))

I'm not sure why it can't build a TreeMap from a List of pairs out of the box. I'm fighting the fog of cold medicine. But it's always fun when people post these questions.

package object distinctby {
  import scala.collection.IterableLike
  import scala.collection.generic.CanBuildFrom
  import scala.collection.mutable.{ Set => MutSet }

  implicit class RichCollection[A, Repr](val xs: IterableLike[A, Repr]) extends AnyVal {
    def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
      val builder = cbf(xs.repr)
      val i = xs.iterator
      val set = MutSet[B]()
      while (i.hasNext) {
        val o = i.next
        val b = f(o)
        if (!set(b)) {
          set += b
          builder += o
        }
      }
      builder.result
    }
  }
}

package distinctby {
  import scala.collection.generic.CanBuildFrom
  import scala.collection.immutable.TreeMap
  object Test extends App {
    val data = List("eins"->"one","un"->"one","zwei"->"two","deux"->"two")
    println(data)
    println(data distinctBy (_._2))
    println(TreeMap((data map (_.swap)): _*))
    println(TreeMap((data.reverse map (_.swap)): _*))
    // groupBy yields a TreeMap of TreeMap, so head is the lexly first
    println(TreeMap(data: _*) groupBy (_._2) map (p => (p._2.head._1, p._1)))
    locally {
      class CBF[A,B](implicit ord: Ordering[A]) 
        extends CanBuildFrom[List[(A, B)], (A,B), TreeMap[A,B]] {
        def apply(from: List[(A,B)]) = TreeMap.newBuilder[A, B](ord)
        def apply() = TreeMap.newBuilder[A, B]
      }
      implicit def cbf[A,B](implicit ord: Ordering[A]) = new CBF[A,B]
      println(data.distinctBy[String, TreeMap[String, String]](_._2)(cbf[String, String]).toSeq)
    }
  }
}

List((eins,one), (un,one), (zwei,two), (deux,two))
List((eins,one), (zwei,two))
Map(one -> un, two -> deux)
Map(one -> eins, two -> zwei)
Map(eins -> one, deux -> two)
ArrayBuffer((eins,one), (zwei,two))

I would just use the List's sorted method to do the default ordering of the tuples. You can also use groupBy (as also pointed out by idonnie) to get distinct values for the second item in each pair. Here's a little example:

scala> val tuples = scala.util.Random.shuffle("abcd".permutations.map{_.splitAt(2)}.toList) // generate some test data
tuples: List[(String, String)] = List((cb,ad), (dc,ab), (ba,dc), (bd,ca), (cb,da), (ca,db), (cd,ba), (cd,ab), (db,ca), (ba,cd), (ac,db), (ac,bd), (ab,cd), (ad,cb), (ca,bd), (bd,ac), (ad,bc), (db,ac), (da,bc), (da,cb), (bc,da), (dc,ba), (ab,dc), (bc,ad))

scala> tuples.sorted.groupBy(_._2).values.map(_.head).toList.sorted
res0: List[(String, String)] = List((ab,cd), (ab,dc), (ac,bd), (ac,db), (ad,bc), (ad,cb), (bc,ad), (bc,da), (bd,ac), (bd,ca), (cd,ab), (cd,ba))

It seems kind of strange to me that you didn't specify a specific way to choose which of the duplicates to keep... This should keep the first one (ordered by the first string), which is what I'm assuming you wanted. If you don't care which one is kept then you can eliminate the first call to sorted . This would also probably be slightly more efficient if you change the first call to sorted to sortBy(_._1) , but I don't really know enough about your specific application to know if that would make any real difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM