[英]map customer and account data to a case class using spark/scala
So I have a case class customer data and a case class customer account data as follow:所以我有一个案例类客户数据和一个案例类客户帐户数据如下:
case class CustomerData(
customerId: String,
forename: String,
surname: String
)
case class AccountData(
customerId: String,
accountId: String,
balance: Long
)
I need to join these two to get them to form the following case class:我需要加入这两个以让它们形成以下案例类:
case class CustomerAccountOutput(
customerId: String,
forename: String,
surname: String,
//Accounts for this customer
accounts: Seq[AccountData],
//Statistics of the accounts
numberAccounts: Int,
totalBalance: Long,
averageBalance: Double
)
I need to show that if null is appearing in accountsId or balance thennumber of accounts is 0, total balance as null and avg balance also as null.我需要证明,如果accountId 或余额中出现空值,则帐户数为0,总余额为空,平均余额也为空。 replacing the null with 0 is also accepted.也接受用 0 替换 null。
The final result should be something like this:最终结果应该是这样的:
+----------+-----------+--------+---------------------------------------------------------------------+--------------+------------+-----------------+
|customerId|forename |surname |accounts |numberAccounts|totalBalance|averageBalance |
+----------+-----------+--------+---------------------------------------------------------------------+--------------+------------+-----------------+
|IND0113 |Leonard |Ball |[[IND0113,ACC0577,531]] |1 |531 |531.0 |
|IND0277 |Victoria |Hodges |[[IND0277,null,null]] |0 |null |null |
|IND0055 |Ella |Taylor |[[IND0055,ACC0156,137], [IND0055,ACC0117,148]] |2 |285 |142.5 |
|IND0129 |Christopher|Young |[[IND0129,null,null]] |0 |null
I have already got the two case classes to join and here is the code:我已经加入了两个案例类,这是代码:
val customerDS = customerDF.as[CustomerData]
val accountDS = accountDF.withColumn("balance",'balance.cast("long")).as[AccountData]
//END GIVEN CODE
val customerAccountsDS = customerDF.join(accountDF,customerDF("customerID") === accountDF("customerID"),"leftouter").drop(accountDF.col("customerId"))
How do i go about getting the above result?我如何获得上述结果?
You should be able to do it by using concat_ws
and collect_list
functions in spark.您应该可以通过在 spark 中使用concat_ws
和collect_list
函数来做到这一点。
//Creating sample data
case class CustomerData(
customerId: String,
forename: String,
surname: String
)
case class AccountData(
customerId: String,
accountId: String,
balance: Long
)
val customercolumns = Seq("customerId","forename","surname")
val acccolumns = Seq("customerId","accountId","balance")
val custdata = Seq(("IND0113", "Leonard","Ball"), ("IND0277", "Victoria","Hodges"), ("IND0055", "Ella","Taylor"),("IND0129","Christopher","Young")).toDF(customercolumns:_*).as[CustomerData]
val acctdata = Seq(("IND0113","ACC0577",531),("IND0055","ACC0156",137),("IND0055","ACC0117",148)).toDF(acccolumns:_*).as[AccountData]
val customerAccountsDS = custdata.join(acctdata,custdata("customerID") === acctdata("customerID"),"leftouter").drop(acctdata.col("customerId"))
import org.apache.spark.sql.functions._
val result = customerAccountsDS.withColumn("accounts", concat_ws(",", $"customerId", $"accountId",$"balance"))
val finalresult = result.groupBy("customerId","forename","surname").agg(collect_list($"accounts"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.