简体   繁体   中英

How to fetch data for a column from two tables in spark scala

There are two tables Customer1 and Customer2

Customer1: List the details of the customer

https://docs.google.com/spreadsheets/d/1GuQaHhZ70D0NHGXuW51B5nNZXrSkthmEduHOhwoZmRg/edit#gid=722500260

Customer2: List the updated details of the customer

https://docs.google.com/spreadsheets/d/1GuQaHhZ70D0NHGXuW51B5nNZXrSkthmEduHOhwoZmRg/edit#gid=0

CustomerName has to be fetched from both the tables.If the customer name is updated it has to be fetched from Customer2 table else it has to fetched from Customer1 table.So all customernames should be listed.

Expexted Resultset:

https://docs.google.com/spreadsheets/d/1GuQaHhZ70D0NHGXuW51B5nNZXrSkthmEduHOhwoZmRg/edit#gid=1227228207

How this can be achieved in spark scala?

You can perform Left Join on customer1 table then using coalesce on customer2 table to get first non null value for customername column.

Example :

scala> val customer1=Seq((1,"shiva","9994323565"),(2,"Mani","9994323567"),(3,"Sneha","9994323568")).toDF("customerid","customername","contact")
scala> val customer2=Seq((1,"shivamoorthy","9994323565"),(2,"Manikandan","9994323567")).toDF("customerid","customername","contact")
scala> customer1.as("c1")
       .join(customer2.as("c2"),$"c1.customerid" === $"c2.customerid","left")
       .selectExpr("c1.customerid",
            "coalesce(c2.customername,c1.customername) as customername")
       .show()

Result:

+----------+------------+
|customerid|customername|
+----------+------------+
|         1|shivamoorthy|
|         2|  Manikandan|
|         3|       Sneha|
+----------+------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM