简体   繁体   中英

Two Seq Comparison in Scala

Here is a scenario of determining updated data in a data update. A Seq of case class objects have their IDs and the case class has an override equals method which makes a data comparison of its attributes excluding the ID field. To find out any updated data entries, I need to retrieve the data from DB. And a comparison of two sequences is needed. What is a Scala approach to find out any objects which are updated?

Just having a thought: creating a map with the object ID as the key and the object as its value. That might work.

(Update) Here is a solution I come out

val existingDataList: Foo = ...
val existingDataMap: Map[Long, Foo] = existingDataList.map(d => d.id -> d)(collection.breakOut)

// To find out updated data    
val updatedData = inputData.filter(d => existingDataMap.get(d.id) != d)

If I understand you right, you've already done the most of the hard work by overriding equals - and of course you MUST also override hashCode correspondingly eg:

case class Thing(id:Long, foo:String, bar:Int) {
  override def equals(that:Any):Boolean = {
    that match {
      case Thing(_, f, b) => (f == this.foo) && (b == this.bar)
      case _ => false
    }
  }

  override def hashCode:Int = {
    // A quick hack. You probably shouldn't do this for real; 
    // set id to 0 and use the default hashing algorithm:
    ScalaRunTime._hashCode(this.copy(id = 0))
  }
}

Now we define a few Thing instances:

val t1 = Thing(111, "t1", 1)
val t1c = Thing(112, "t1", 1) // Same as t1 but with a new ID
val t2 = Thing(222, "t2", 2)
val t3 = Thing(333, "t3", 3)
val t4 = Thing(444, "t4", 4)
val t4m = Thing(444, "t4m", 4)  // Same as t4 but with a modified "foo"

Let's make a couple of sequences:

val orig = Seq(t1, t2, t3, t4)
val mod = Seq(t1c, t2, t3, t4m)

And now a diff tells us everything we need to know:

mod.diff(orig)
// => returns Seq(t4m) - just what we wanted

So, you have two collections, and you want to find pairs of objects in them, having the same ids, but different data, right? diff isn't really what you want here.

Something like this will do it:

(first ++ second)
  .groupBy (_.id)
  .mapValues (_.toSet)
  .filterNot { case (_, v) => v.size != 2 }
  .values
  .map { v => v.head -> v.last }

It will give you a list of tuples like (first, second), where both elements have the same id, but different data.

This assumes that you ids are unique within each collection, and that each id appears in both collections.

Alternatively, if you can guarantee that the collections are of the same size, and contain exactly the same set of ID's, you could do something like this, which is less efficient, but a bit simpler:

     first.sortBy(_.id)
       .zip(second.sortBy(_.id))
       .filterNot { case (a, b) => a == b }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM