简体   繁体   中英

Convert RDD[(String, String, String)] to RDD[(String, (String, String))] in Spark Scala

There are 2 rdds , which i am trying to join : It's getting joined when there are 2 parameters in each rdd , however when i add a new parameter in existingGTINs rdd , i am facing below error:

Below is the code:

newGTS.collect()
(00070137115045,00070137115045)
(00799999150451,00799999150451)

existingGTS.collect()
(00799999150451,(00003306-808b-46da-bc7f-419c5ae223a7,2016-10-10 10:23:12.0))
(00016700000653,(00006d79-94ea-4651-be0c-0ce77958cd45,2021-05-31 01:20:39.291))
(00923846453024,(0000704b-b40d-4b9e-b266-f7c66723df0e,null))
(00610074049265,(0000a7a1-587c-4b13-a155-7846df82fdee,2020-03-20 12:16:55.873))
(00034100516079,(0002495f-6084-49dd-aadb-20cd137d9694,null))


val join1 = newGTINs.leftOuterJoin(existingGTINs) mapValues {
      case (gtin, iUUID, createDt) => (iUUID.isEmpty, iUUID.getOrElse(UUID.randomUUID.toString))
    }


 error: constructor cannot be instantiated to expected type;
 found   : (T1, T2, T3)
 required: (String, Option[(String, String)])
                 case (gtin, iUUID, createDt) => (iUUID.isEmpty, iUUID.getOrElse(UUID.randomUUID.toString))
                      ^

PS: UUID.randomUUID.toString --> this function is to creatre a random id

I am gussing that newGTINs and existingGTINs used in join are supposed to be same as newGTS and existingGTS shown with collects.

Since your newGTSINs looks to be a RDD[(String, String)] and existingGTINS is a RDD[(String, (String, String))] , your newGTINs.leftOuterJoin(existingGTINs) will be a RDD[(String,(String, Option[(String, String)]))] .

Which means that your mapValues will expect a function (String, Option[(String, String)]) => SomeNewType or as a parameter. It can also accept a partial function satisfying the similar type semantics.

But your { case (gtin, iUUID, createDt) => (iUUID.isEmpty, iUUID.getOrElse(UUID.randomUUID.toString)) } is a partial function which corresponds to type (String, String, String) => SomeNewType .

Notice the difference, hence the error. You can fix this by providing appropriate partial function to statisfy the mapValues requirement.

val join1 = 
  newGTINs
    .leftOuterJoin(existingGTINs)
    .mapValues {
      case (gtin, Some(iUUID, createDt)) =>
        (iUUID.isEmpty, iUUID.getOrElse(UUID.randomUUID.toString))
      case (gtin, None) =>
        // what heppens for gtins without matching element in existing one's
        (true, UUID.randomUUID.toString)
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM