使用动态列数更改数据框行值

Question

I have a dataframe (contains 10 columns) for which I want to change the value of a row (for the last column only). 我有一个数据框（包含10列），我想为其更改行的值（仅适用于最后一列）。 I have written following code for this: 我为此编写了以下代码：

 val newDF = spark.sqlContext.createDataFrame(WRADF.rdd.map(r=> {
      Row(r.get(0), r.get(1),
          r.get(2), r.get(3),
          r.get(4), r.get(5),
          r.get(6), r.get(7),
          r.get(8), decrementCounter(r))
     }), WRADF.schema)

I want to change the value of a row for 10th column only (for which I wrote decrementCounter() function). 我只想更改第10列的行的值（为此我写了decrementCounter()函数）。 But the above code only runs for dataframes with 10 columns. 但是上面的代码仅适用于具有10列的数据帧。 I don't know how to convert this code so that it can run for different dataframe (with different number of columns). 我不知道如何转换此代码，以便它可以针对不同的数据帧（具有不同的列数）运行。 Any help will be appreciated. 任何帮助将不胜感激。

Answer 1

Don't do something like this. 不要做这样的事情。 Define udf 定义udf

import org.apache.spark.sql.functions.udf._

val decrementCounter = udf((x: T) => ...) // adjust types and content to your requirements

df.withColumn("someName", decrementCounter($"someColumn"))

Answer 2

I think UDF will be a better choice because it can be applied using the Column name itself. 我认为UDF将是一个更好的选择，因为它可以使用列名本身来应用。

For more on udf you can take a look here : https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html 有关udf的更多信息，可以在这里查看： https : //docs.databricks.com/spark/latest/spark-sql/udf-scala.html

For your code just use this : 对于您的代码，只需使用以下代码：

import org.apache.spark.sql.functions.udf._

val decrementCounterUDF = udf(decrementCounter _) 

df.withColumn("columnName", decrementCounterUDF($"columnName"))

What it will does is apply this decrementCounter function on each and every value of column columnName . 它将做的是将此decrementCounter函数应用于列columnName每个值。

I hope this helps, cheers ! 希望这对您有所帮助！

使用动态列数更改数据框行值

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-01-13 00:06:57

解决方案2
0 2018-01-13 06:05:48

使用动态列数更改数据框行值

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-01-13 00:06:57

解决方案2 0 2018-01-13 06:05:48

解决方案1
0 已采纳 2018-01-13 00:06:57

解决方案2
0 2018-01-13 06:05:48