简体繁体中英

scala/spark different case class based on input data

原文 2018-03-04 13:26:02 7 1 scala/ apache-spark

In spark / scala, how to do data driven instantiation of the case classes?

Explanation: Let's say we have an input dataset of some kind of contracts (eg telecom subscriptions) and those contracts need to be somehow evaluated. Input dataset contains values like date of creation, start of the validity of contract, end of validity, some amounts, additional options, family discount etc. which all don't have to be filled (eg some contracts don't have additional options)

Does it make sense to model all type of contracts using case classes? So, one input row coming from the dataset could be a contract for fixed line, or mobile number or some other service. Then i'd try to deduct the most details the input row has and instantiate appropriate case class using match? Each of these case classes would have a functions that returns a value of the contract based on this data and some static data coming from elsewhere (a lookup table, maybe k,v map). This function would then be used in a call to dataset 'map'. Better way to do this?

Given that the case classes idea makes sense, each class could also do simulations on the same input data. Eg what if customer downgrades his internet speed, what would then be estimated income for this contract? So for one input row, i'd have to return 2 new columns: value of the contract and simulated value of the contract. Doing 'what if' scenarios, it could also be that for one input row i do several scenarios (at once?) which would than return several rows (eg 1. what if the customer buys something more; 2. what if customer downgrades; 3. what if customer cancels all additional options on the contract).

Is this even the right approach to problem? How to make these evaluations 'data driven' since input values drive which case class it is and configuration/run options drive how many times a 'map' on the dataset should be triggered?

1 answers

Modeling huge amount of different combination of products into a class hierarchy tree is not pragmatic.

Solution that worked is to have nested classes.

So, from one input row, columns would be grouped into different objects that make sense and those would be data members of the parent class.

I've tried this on banking contracts instead of telecom contracts (as used in the question): if there is a contract for a loan which is delivered in one row in a dataframe, columns of that one row can be grouped into maturity information, interest information etc. Each of these information groups has its own class and methods. Instance of these classes become a data member of the parent Loan class.

This way i could model different interest behavior, maturity behavior etc and just call it in the .map from the Loan object itself.

How to change datatype of columns in a dataframe based on a case class in scala/spark

map customer and account data to a case class using spark/scala

udf spark Scala return case class

Spark Scala : Add Case class object to dataframe

Scala case class ignoring import in the Spark shell

scala generic encoder for spark case class

Spark Scala case class with array and map datatype

Join dataset with case class spark scala

No TypeTag available for a case class using scala 3 with spark 3

Scala case class not return data

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to change datatype of columns in a dataframe based on a case class in scala/spark map customer and account data to a case class using spark/scala udf spark Scala return case class Spark Scala : Add Case class object to dataframe Scala case class ignoring import in the Spark shell scala generic encoder for spark case class Spark Scala case class with array and map datatype Join dataset with case class spark scala No TypeTag available for a case class using scala 3 with spark 3 Scala case class not return data

Related Tags

scala/spark different case class based on input data

Question

1 answers

solution1 0 ACCPTED 2018-06-22 09:14:16

solution1
0 ACCPTED 2018-06-22 09:14:16