简体   繁体   中英

Join Multiple Lists in Scala

I have a series of lists (lets assume the following 3) where the first element all represent a primary key.

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

I would like to full join them together to from a new List such as below. You may assume number of items in the resulting tuples is predetermined to be 4

(1, "A", "AA", "AAA")
(2, "B", "BB", ""   )
(3, "C", "CC", "CCC")
(4, "" , "DD", ""   )

How can I achieve this in a functional manner and using Scala?

Lets say you are getting an input list such as

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

Then by applying the following function,

List(A,B,C).flatten.groupBy(_._1).map{
  case (k,v) => k :: v.map(_._2)
}

You will get an output

res0: scala.collection.immutable.Iterable[List[Any]] = List(List(2, B, BB), List(4, DD), List(1, A, AA, AAA), List(3, C, CC, CCC))

However if you still want to get empty strings in your output, you can try the following

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

val intermediate = List(A,B,C).flatten.groupBy(_._1).map{
  case (k,v) => k :: v.map(_._2)
}

val maxSize = intermediate.map(_.size).max
intermediate.map{
  x =>  x.size== maxSize match {
    case true =>
      x
    case false =>
      x ::: List.fill(maxSize-x.size)("")
  }
}

This fetches you an output

res0: scala.collection.immutable.Iterable[List[Any]] = List(List(2, "B", "BB", ), List(4, "DD", , ), List(1, "A", "AA", "AAA"), List(3, "C", "CC", "CCC"))

Tuples have a performance limitation as well as its size is limited to 22, hence it would be highly advisable to go for lists.

You can use tail recursion to solve

var a= List((1,"A"), (2,"B"), (3,"C"))
var b= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var c= List((1,"AAA"), (3,"CCC"))

val lst: List[List[(Int, String)]] = List(a, b, c)

def fun(input: List[List[(Int, String)]]): List[Any] = {
@tailrec
def itr(acc: List[Any], inp: List[List[(Int, String)]], key: Int, maxKey: Int): List[Any] = {
  key match {
    case x if x > maxKey => acc
    case _ =>
      itr(acc ::: List(key :: inp.map(itemLst => {
      itemLst.find(_._1 == key).map(_._2).getOrElse("")
      })), inp, key + 1, maxKey)
  }
}
itr(List(), input, input.head.head._1, input.map(_.length).max)
}

println(fun(lst))

Output is

List(List(1, A, AA, AAA), List(2, B, BB, ), List(3, C, CC, CCC), List(4, , DD, ))

As mentioned in a comment, tuples in Scala are subject to limitations and abstracting over their arity can be cumbersome. In case you wish to do so, you may want to have a look at Shapeless.

For a more straightforward (albeit not very clean) solution, the following will do (with implementations for two different target arities):

val a = List((1,"A"), (2,"B"), (3,"C"))
val b = List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
val c = List((1,"AAA"), (3,"CCC"))

def join4[K, V](empty: V)(pss: List[(K, V)]*): List[(K, V, V, V)] =
  pss.reduceOption(_ ++ _).fold(List.empty[(K, V, V, V)])(_.groupBy(_._1).mapValues(_.map(_._2)).collect {
    case (key, Nil) => (key, empty, empty, empty)
    case (key, List(a)) => (key, a, empty, empty)
    case (key, List(a, b)) => (key, a, b, empty)
    case (key, List(a, b, c)) => (key, a, b, c)
    case (key, list) => throw new RuntimeException(s"Group for $key is too long (${list.size} > 3)")
  }.toList)

def join5[K, V](empty: V)(pss: List[(K, V)]*): List[(K, V, V, V, V)] =
  pss.reduceOption(_ ++ _).fold(List.empty[(K, V, V, V, V)])(_.groupBy(_._1).mapValues(_.map(_._2)).collect {
    case (key, Nil) => (key, empty, empty, empty, empty)
    case (key, List(a)) => (key, a, empty, empty, empty)
    case (key, List(a, b)) => (key, a, b, empty, empty)
    case (key, List(a, b, c)) => (key, a, b, c, empty)
    case (key, List(a, b, c, d)) => (key, a, b, c, d)
    case (key, list) => throw new RuntimeException(s"Group for $key is too long (${list.size} > 4)")
  }.toList)

join4("")(a, b, c)
join5("")(a, b, c)

You can play with this code on Scastie .

As it is mentioned in the question that "we may assume number of items in the resulting tuples is predetermined to be 4 ", the following solution which returns only tuples as requested works: The lists given are:

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

In Scala REPL:

scala> val list1 = List(A,B,C).flatten
list1: List[(Int, String)] = List((1,A), (2,B), (3,C), (1,AA), (2,BB), (3,CC), (4,DD), (1,AAA), (3,CCC))

scala> val list2 = List(A,B,C).flatten.map(x=>x._2.toArray).flatten.distinct
list2: List[Char] = List(A, B, C, D)

Then using the above two lists , the required resultList can be obtained as below:

scala> val resultList = 
          list2.map(x=>list1.filter(y=>y._2.contains(x))).map{
            case List() =>
            case List((a,b)) => (a,b,"","")
            case List((a,b),(_,c))=>(a,b,c,"")
            case List((a,b),(_,c),(_,d)) =>(a,b,c,d)    
        }
resultList: List[Any] = List((1,A,AA,AAA), (2,B,BB,""), (3,C,CC,CCC), (4,DD,"",""))

But, if we do care about the position of the empty string "" in each tuple , the code becomes a bit lengthy as we have to account for all combinations in case statements with if conditions in pattern matching as below:

scala> val resultList =
           list2.map(x=>list1.filter(y=>y._2.contains(x))).map{
       case List() =>
       case List((a,b)) if(b.size==1) => (a,b,"","")
       case List((a,b)) if(b.size==2) => (a,"",b,"")
       case List((a,b)) if(b.size==3) => (a,"","",b)
       case List((a,b),(_,c)) if(b.size==1 && c.size==2)=>(a,b,c,"")
       case List((a,b),(_,c)) if(b.size==2 && c.size==1)=>(a,c,b,"")
       case List((a,b),(_,c)) if(b.size==1 && c.size==3)=>(a,b,"",c)
       case List((a,b),(_,c)) if(b.size==3 && c.size==1)=>(a,c,"",b)
       case List((a,b),(_,c)) if(b.size==2 && c.size==3)=>(a,"",b,c)
       case List((a,b),(_,c)) if(b.size==3 && c.size==2)=>(a,"",c,b)
       case List((a,b),(_,c),(_,d)) if(b.size==1&&c.size==2 && d.size==3)=> 
            (a,b,c,d)
       case List((a,b),(_,c),(_,d)) if(b.size==1&&c.size==3 && d.size==2)= 
            (a,b,d,c)
       case List((a,b),(_,c),(_,d)) if(b.size==2&&c.size==1&& d.size==3)=>  
            (a,c,b,d)
       case List((a,b),(_,c),(_,d)) if(b.size==2&&c.size==3&& d.size==1)=>  
            (a,d,b,c)
       case List((a,b),(_,c),(_,d)) if(b.size==3&&c.size==1&& d.size==2)=>  
            (a,c,d,b)
       case List((a,b),(_,c),(_,d)) if(b.size==3&&c.size==2&& d.size==1)=>  
            (a,d,c,b)

       }
resultList: List[Any] = List((1,A,AA,AAA), (2,B,BB,""), (3,C,CC,CCC), (4,"",DD,""))

But it should be noted however that while doing such operations using tuples, the type information will be lost and difficult to handle with the resulting tuples list. It may be better going for some other data structures like List etc instead. However, this is solved in view of requirements mentioned in the question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM