简体   繁体   English

scala的递归算术运算

[英]recursive arithmetical operations with scala

I have a scala List object with a recursive definition of all operations I have to do with columns of a spark dataframe.我有一个 scala List 对象,其中包含与 spark 数据框的列有关的所有操作的递归定义。 For example, the operations例如,操作

(C1 - C2) + ( (C3 - C4)- (C5 -C6) ) (C1 - C2) + ( (C3 - C4)- (C5 -C6) )

are defined by the next scala List:由下一个 scala 列表定义:

List("addition", List("substraction",List("C1","C2")),
                 List("substraction",
                                  List("substraction",List("C3","C4")),
                                  List("substraction"), List("C5","C6"))
)

where "C1",...,"C5" are the names of the spark dataframes columns.其中"C1",...,"C5"是 spark 数据框列的名称。

I would like to define a recursive scala function that gives me the final column result.我想定义一个递归 scala 函数,它给我最终的列结果。

Does anyone know a way to do it?有谁知道一种方法吗?

The way you define the operation is quite strange.您定义操作的方式很奇怪。 You encapsulate column name operands in a list, but not complex operands.您将列名操作数封装在列表中,而不是复杂操作数。 Therefore your lists can either have 2 or three elements.因此,您的列表可以有 2 个或三个元素。 How would you define something like (A + (BC)) ?您如何定义(A + (BC))类的东西? I would start by fixing that and write your operation either like this (3 elements per list):我将首先修复它并像这样编写您的操作(每个列表 3 个元素):

val list = List("addition",
    List("substraction","C1","C2"),
    List("substraction",
        List("substraction","C3","C4"),
        List("substraction", "C5","C6")
    )
)

or like this (2 elements per list):或像这样(每个列表 2 个元素):

val list = List("addition", List(
    List("substraction", List("C1","C2")),
    List("substraction", List(
        List("substraction", List("C3","C4")),
        List("substraction", List("C5","C6"))
    )))
)

The second version being much more verbose, let's pick the first one and write the recursive function:第二个版本更冗长,让我们选择第一个并编写递归函数:

def operation_to_col(operation : Any) : Column = {
    operation match {
        case x : String => col(x)
        case List("addition", s1 : Any, s2 : Any) =>
             operation_to_col(s1) + operation_to_col(s2)
        case List("substraction", s1 : Any, s2 : Any) =>
             operation_to_col(s1) + operation_to_col(s2)
    }
}

First, I am going to change the definition of the operations.首先,我将更改操作的定义。 For example, the operations例如,操作

(C1 - C2) + ( (C3 - C4)- (C5 -C6) ) (C1 - C2) + ( (C3 - C4)- (C5 -C6) )

are defined by the next scala List:由下一个 scala 列表定义:

val list = List("addition",
        List("substraction","C1","C2"),
        List("substraction",
              List("substraction","C3","C4"),
              List("substraction", "C5","C6")

) ) ) )

I am going to create a dataframe for an example:我将为示例创建一个数据框:

val data = Seq((1000, 1, 2,3,4,5), (2000,1,2,3,4,5), (3000,1,2,3,4,5))
val rdd = spark.sparkContext.parallelize(data)
val df = rdd.toDF("C1","C2","C3","C4","C5","C6")

The List of permitted operations is:允许的操作列表是:

val operations=List("addition","subtraction","multiplication","division")

I created the next Map object to associate the operations and their symbols:我创建了下一个 Map 对象来关联操作及其符号:

val oprSimbols:Map[String,String] = Map("addition"->"+", "substraction"-> "-", "multiplication"->"*","division"->"/")

Finally, I define the function that solves the problem:最后,我定义了解决问题的函数:

def operation_to_col(df: DataFrame,oprSimbols: Map[String,String], 
     operations:List[String], list : Any) : DataFrame = {
     list match {
        case x if operations.contains(x.toString) => df.select(col(x.toString))

        case List(oprName:String,x:String, y:String) =>{
           val sym = oprSimbols(oprName)
           val exprOpr = List(x,sym,y).mkString(" ")
           df.selectExpr(exprOpr)}

        case List(oprName:String, s1 : Any, s2 : Any) =>{
           val df1 = operation_to_col(df,oprSimbols,operations,s1)
           val df2 = operation_to_col(df,oprSimbols,operations,s2)
           val sym = oprSimbols(oprName)
           val exprOpr = List(df1.columns(0),sym,df2.columns(0)).mkString(" ")
           df.selectExpr(exprOpr)
}

} } } }

We can check it:我们可以检查一下:

operation_to_col(df,oprSimbols, operations, list )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM