简体   繁体   中英

Finding Sum in a spark Scala dataframe with delimited values

I have the below dataframe with me.

val df1=Seq(
("1_2_3","5_10"),
("4_5_6","15_20")
)toDF("c1","c2")

+-----+-----+
|   c1|   c2|
+-----+-----+
|1_2_3| 5_10|
|4_5_6|15_20|
+-----+-----+

How to get the sum in a separate column based on the condition -

-Omit third value after delimiter - '_' in the first column.
-adding first value of each column ie, omitting '_3' and '_6' in 1_2_3 and 4_5_6 and then adding 1,5 and 2,10. Also adding 15+4 and 20+5.

Expected output -

+-----+-----+-----+
|   c1|   c2|  res|
+-----+-----+-----+
|1_2_3| 5_10| 6_12|
|4_5_6|15_20|19_25|
+-----+-----+-----+

Try this-

zip_with + split

  val df1=Seq(
      ("1_2_3","5_10"),
      ("4_5_6","15_20")
    )toDF("c1","c2")
    df1.show(false)

    df1.withColumn("res",
      expr("concat_ws('_', zip_with(split(c1, '_'), split(c2, '_'), (x, y) -> cast(x+y as int)))"))
      .show(false)

    /**
      * +-----+-----+-----+
      * |c1   |c2   |res  |
      * +-----+-----+-----+
      * |1_2_3|5_10 |6_12 |
      * |4_5_6|15_20|19_25|
      * +-----+-----+-----+
      */

update dynamically for 50 columns

 val end = 51 // 50 cols
    val df = spark.sql("select '1_2_3' as c1")
    val new_df = Range(2, end).foldLeft(df){(df, i) => df.withColumn(s"c$i", $"c1")}
    new_df.show(false)
    /**
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      * |c1   |c2   |c3   |c4   |c5   |c6   |c7   |c8   |c9   |c10  |c11  |c12  |c13  |c14  |c15  |c16  |c17  |c18  |c19  |c20  |c21  |c22  |c23  |c24  |c25  |c26  |c27  |c28  |c29  |c30  |c31  |c32  |c33  |c34  |c35  |c36  |c37  |c38  |c39  |c40  |c41  |c42  |c43  |c44  |c45  |c46  |c47  |c48  |c49  |c50  |
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      * |1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      */
    val res = new_df.withColumn("res", $"c1")
    Range(2, end).foldLeft(res){(df4, i) =>
      df4.withColumn("res",
        expr(s"concat_ws('_', zip_with(split(res, '_'), split(${s"c$i"}, '_'), (x, y) -> cast(x+y as int)))"))
    }
      .show(false)
    /**
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
      * |c1   |c2   |c3   |c4   |c5   |c6   |c7   |c8   |c9   |c10  |c11  |c12  |c13  |c14  |c15  |c16  |c17  |c18  |c19  |c20  |c21  |c22  |c23  |c24  |c25  |c26  |c27  |c28  |c29  |c30  |c31  |c32  |c33  |c34  |c35  |c36  |c37  |c38  |c39  |c40  |c41  |c42  |c43  |c44  |c45  |c46  |c47  |c48  |c49  |c50  |res       |
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
      * |1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|50_100_150|
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
      */

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM