[英]change a dataframe row value with dynamic number of columns spark scala
I have a dataframe (contains 10 columns) for which I want to change the value of a row (for the last column only). 我有一个数据框(包含10列),我想为其更改行的值(仅适用于最后一列)。 I have written following code for this:
我为此编写了以下代码:
val newDF = spark.sqlContext.createDataFrame(WRADF.rdd.map(r=> {
Row(r.get(0), r.get(1),
r.get(2), r.get(3),
r.get(4), r.get(5),
r.get(6), r.get(7),
r.get(8), decrementCounter(r))
}), WRADF.schema)
I want to change the value of a row for 10th column only (for which I wrote decrementCounter()
function). 我只想更改第10列的行的值(为此我写了
decrementCounter()
函数)。 But the above code only runs for dataframes with 10 columns. 但是上面的代码仅适用于具有10列的数据帧。 I don't know how to convert this code so that it can run for different dataframe (with different number of columns).
我不知道如何转换此代码,以便它可以针对不同的数据帧(具有不同的列数)运行。 Any help will be appreciated.
任何帮助将不胜感激。
Don't do something like this. 不要做这样的事情。 Define
udf
定义
udf
import org.apache.spark.sql.functions.udf._
val decrementCounter = udf((x: T) => ...) // adjust types and content to your requirements
df.withColumn("someName", decrementCounter($"someColumn"))
I think UDF will be a better choice because it can be applied using the Column name itself. 我认为UDF将是一个更好的选择,因为它可以使用列名本身来应用。
For more on udf you can take a look here : https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html 有关udf的更多信息,可以在这里查看: https : //docs.databricks.com/spark/latest/spark-sql/udf-scala.html
For your code just use this : 对于您的代码,只需使用以下代码:
import org.apache.spark.sql.functions.udf._
val decrementCounterUDF = udf(decrementCounter _)
df.withColumn("columnName", decrementCounterUDF($"columnName"))
What it will does is apply this decrementCounter function on each and every value of column columnName
. 它将做的是将此decrementCounter函数应用于列
columnName
每个值。
I hope this helps, cheers ! 希望这对您有所帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.