![](/img/trans.png)
[英]getting Schema for type org.apache.spark.sql.Column is not supported while running UDF in spark dataframe
[英]Mapping List items to org.apache.spark.sql.Column type
I am trying to sum a list of columns in my Dataframe of type org.apache.spark.sql.DataFrame and create a new column 'sums' and dataframe 'out'.
如果我手動列出列,我可以很容易地做到這一點,例如,這行得通
val columnsToSum = List(col("led zeppelin"), col("lenny kravitz"), col("leona lewis"), col("lily allen"))
val out = df3.withColumn("sums", columnsToSum.reduce(_ + _))
但是,如果我希望通過直接從數據框中提取列名來執行此操作,則列表 object 中的項目不一樣,我無法執行此操作,例如
val columnsToSum = df2.schema.fields.filter(f => f.dataType.isInstanceOf[StringType]).map(_.name).patch(0, Nil, 1).toList // arrays are mutable (remove "user" from list)
println(tmpArr)
>> List(a perfect circle, abba, ac/dc, adam green, aerosmith, afi, ...
// Trying the same method
val out = df3.withColumn("sums", columnsToSum.reduce(_ + _))
>> found : String
required: org.apache.spark.sql.Column
val out = df3.withColumn("sums", tmpArr.reduce(_ + _))found : String
required: org.apache.spark.sql.Column
val out = df3.withColumn("sums", tmpArr.reduce(_ + _))
我該如何進行這種類型的轉換? 我試過了:
List(a perfect circle, abba, ac/dc, ...).map(_.Column)
List(a perfect circle, abba, ac/dc, ...).map(_.spark.sql.Column)
List(a perfect circle, abba, ac/dc, ...).map(_.org.apache.spark.sql.Column)
哪個沒用 提前謝謝
您可以使用 function col 從字符串中獲取列 object (實際上您已經在第一個代碼段中使用它)。
所以這應該工作:
columnsToSum.map(col).reduce(_ + _)
或移動詳細版本:
columnsToSum.map(c => col(c)).reduce(_ + _)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.