简体   繁体   中英

how to access the column index for spark dataframe in scala for calculation

I am new to Scala programming , i have worked on R very extensively but while working for scala it has become tough to work in a loop to extract specific columns to perform computation on the column values

let me explain with help of an example :

i have Final dataframe arrived after joining the 2 dataframes, now i need to perform calculation like 在此处输入图片说明

Above is the computation with reference to the columns , so after computation we'll get the below spark dataframe

在此处输入图片说明

How to refer to the column index in for-loop to compute the new column values in spark dataframe in scala

Here is one solution:

Input Data:
+---+---+---+---+---+---+---+---+---+
|a1 |b1 |c1 |d1 |e1 |a2 |b2 |c2 |d2 |
+---+---+---+---+---+---+---+---+---+
|24 |74 |74 |21 |66 |65 |100|27 |19 |
+---+---+---+---+---+---+---+---+---+

Zipped the columns to remove the non-matching columns:

val oneCols = data.schema.filter(_.name.contains("1")).map(x => x.name).sorted
val twoCols = data.schema.filter(_.name.contains("2")).map(x => x.name).sorted
val cols = oneCols.zip(twoCols) 

//cols: Seq[(String, String)] = List((a1,a2), (b1,b2), (c1,c2), (d1,d2))

Use foldLeft function to dynamically add columns:

import org.apache.spark.sql.functions._
val result = cols.foldLeft(data)((data,c) => data.withColumn(s"Diff_${c._1}",
                                                  (col(s"${lit(c._2)}") - col(s"${lit(c._1)}"))/col(s"${lit(c._2)}")))

Here is the result:

result.show(false)  

+---+---+---+---+---+---+---+---+---+------------------+-------+-------------------+--------------------+
|a1 |b1 |c1 |d1 |e1 |a2 |b2 |c2 |d2 |Diff_a1           |Diff_b1|Diff_c1            |Diff_d1             |
+---+---+---+---+---+---+---+---+---+------------------+-------+-------------------+--------------------+
|24 |74 |74 |21 |66 |65 |100|27 |19 |0.6307692307692307|0.26   |-1.7407407407407407|-0.10526315789473684|
+---+---+---+---+---+---+---+---+---+------------------+-------+-------------------+--------------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM