繁体   English   中英

使用 spark/scala 将客户和帐户数据映射到案例类

[英]map customer and account data to a case class using spark/scala

所以我有一个案例类客户数据和一个案例类客户帐户数据如下:

case class CustomerData(
                      customerId: String,
                      forename: String,
                      surname: String
                    )
 case class AccountData(
                      customerId: String,
                      accountId: String,
                      balance: Long
                    )

我需要加入这两个以让它们形成以下案例类:

case class CustomerAccountOutput(
                                customerId: String,
                                forename: String,
                                surname: String,
                                //Accounts for this customer
                                accounts: Seq[AccountData],
                                //Statistics of the accounts
                                numberAccounts: Int,
                                totalBalance: Long,
                                averageBalance: Double
                              )

我需要证明,如果accountId 或余额中出现空值,则帐户数为0,总余额为空,平均余额也为空。 也接受用 0 替换 null。

最终结果应该是这样的:

+----------+-----------+--------+---------------------------------------------------------------------+--------------+------------+-----------------+
|customerId|forename   |surname |accounts                                                             |numberAccounts|totalBalance|averageBalance   |
+----------+-----------+--------+---------------------------------------------------------------------+--------------+------------+-----------------+
|IND0113   |Leonard    |Ball    |[[IND0113,ACC0577,531]]                                              |1             |531         |531.0            |
|IND0277   |Victoria   |Hodges  |[[IND0277,null,null]]                                                |0             |null        |null             |
|IND0055   |Ella       |Taylor  |[[IND0055,ACC0156,137], [IND0055,ACC0117,148]]                       |2             |285         |142.5            |
|IND0129   |Christopher|Young   |[[IND0129,null,null]]                                                |0             |null   

我已经加入了两个案例类,这是代码:

val customerDS = customerDF.as[CustomerData]
  val accountDS = accountDF.withColumn("balance",'balance.cast("long")).as[AccountData]
  //END GIVEN CODE

  val customerAccountsDS = customerDF.join(accountDF,customerDF("customerID") === accountDF("customerID"),"leftouter").drop(accountDF.col("customerId"))

我如何获得上述结果?

您应该可以通过在 spark 中使用concat_wscollect_list函数来做到这一点。

//Creating sample data
case class CustomerData(
                      customerId: String,
                      forename: String,
                      surname: String
                    )
 case class AccountData(
                      customerId: String,
                      accountId: String,
                      balance: Long
                    )
val customercolumns = Seq("customerId","forename","surname")
val acccolumns = Seq("customerId","accountId","balance")
val custdata = Seq(("IND0113", "Leonard","Ball"), ("IND0277", "Victoria","Hodges"), ("IND0055", "Ella","Taylor"),("IND0129","Christopher","Young")).toDF(customercolumns:_*).as[CustomerData]
val acctdata = Seq(("IND0113","ACC0577",531),("IND0055","ACC0156",137),("IND0055","ACC0117",148)).toDF(acccolumns:_*).as[AccountData]
val customerAccountsDS = custdata.join(acctdata,custdata("customerID") === acctdata("customerID"),"leftouter").drop(acctdata.col("customerId"))
import org.apache.spark.sql.functions._
val result = customerAccountsDS.withColumn("accounts", concat_ws(",", $"customerId", $"accountId",$"balance"))
val finalresult = result.groupBy("customerId","forename","surname").agg(collect_list($"accounts"))

您可以看到如下输出: 在此处输入图片说明

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM