簡體   English   中英

org.apache.spark.sql.AnalysisException:

[英]org.apache.spark.sql.AnalysisException:

 df.withColumn(x, when($"x" > 75, $"x" + 10).otherwise($"x")).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`x`' given input columns: [Name, Subject, Marks];;
'Project [Name#7, Subject#8, CASE WHEN ('x > 75) THEN ('x + 10) ELSE 'x END AS Marks#38]

scala> df.show()
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
| Ram|Physics|   80|
|Sham|English|   90|
|Ayan|   Math|   70|
+----+-------+-----+


scala> x
res6: String = Marks

我想傳遞一個變量作為參數,它存儲 dataframe 的列值。 並且基於該參數,它將檢查條件、計算值並替換 dataframe 中具有相同名稱的列。

實際上更大的問題是,有多個同名的列,例如“col1”,“col2”,“col3”...。我會將這些列存儲在一個數組中,並通過將數組的值傳遞給dataframe 運行。 但就目前而言。 如果可以在 spark-scala 中處理,請告訴我問題的解決方案。

嘗試使用字符串插值col({s"${x}"})

Example:

val df=Seq(("Ram","Physics",80),("Sham","English",90),("Ayan","Math",70)).toDF("Name","Subject","Marks")

df.show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics|   80|
//|Sham|English|   90|
//|Ayan|   Math|   70|
//+----+-------+-----+

import org.apache.spark.sql.functions._
val x:String = "Marks"

df.withColumn(x, when(col(s"${x}") > 75, col(s"${x}") + 10).otherwise(col(s"${x}"))).show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics|   90|
//|Sham|English|  100|
//|Ayan|   Math|   70|
//+----+-------+-----+

使用functions.col如下 -

 df1.show(false)

    /**
      * +----+-------+-----+
      * |Name|Subject|Marks|
      * +----+-------+-----+
      * |Ram |Physics|80   |
      * |Sham|English|90   |
      * |Ayan|Math   |70   |
      * +----+-------+-----+
*/
val x = "Marks"
    // use functions.col
    df1.withColumn(x, when(col(x) > 75, col(x) + 10).otherwise(col(x)))
      .show()

    /**
      * +----+-------+-----+
      * |Name|Subject|Marks|
      * +----+-------+-----+
      * | Ram|Physics|   90|
      * |Sham|English|  100|
      * |Ayan|   Math|   70|
      * +----+-------+-----+
      */

為了更好地理解,我將列分隔為requiredColumnsallColumns

檢查下面的代碼。

scala> df.show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|80   |
|Sham|English|90   |
|Ayan|Math   |70   |
+----+-------+-----+
scala> val requiredColumns = Set("Marks")
requiredColumns: scala.collection.immutable.Set[String] = Set(Marks)
scala> val allColumns = df.columns
allColumns: Array[String] = Array(Name, Subject, Marks)
scala> 
val columnExpr =  allColumns
                    .filterNot(requiredColumn(_))
                    .map(col(_)) ++ requiredColumns
                    .map(c => when(col(c) > 75,col(c) + 10).otherwise(col(c)).as(c))

Output

scala> df.select(columnExpr:_*).show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|90   |
|Sham|English|100  |
|Ayan|Math   |70   |
+----+-------+-----+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM