[英]Exception in thread “main” org.apache.spark.sql.AnalysisException:
[英]org.apache.spark.sql.AnalysisException:
df.withColumn(x, when($"x" > 75, $"x" + 10).otherwise($"x")).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`x`' given input columns: [Name, Subject, Marks];;
'Project [Name#7, Subject#8, CASE WHEN ('x > 75) THEN ('x + 10) ELSE 'x END AS Marks#38]
scala> df.show()
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
| Ram|Physics| 80|
|Sham|English| 90|
|Ayan| Math| 70|
+----+-------+-----+
scala> x
res6: String = Marks
我想傳遞一個變量作為參數,它存儲 dataframe 的列值。 並且基於該參數,它將檢查條件、計算值並替換 dataframe 中具有相同名稱的列。
實際上更大的問題是,有多個同名的列,例如“col1”,“col2”,“col3”...。我會將這些列存儲在一個數組中,並通過將數組的值傳遞給dataframe 運行。 但就目前而言。 如果可以在 spark-scala 中處理,請告訴我問題的解決方案。
嘗試使用字符串插值col({s"${x}"})
。
Example:
val df=Seq(("Ram","Physics",80),("Sham","English",90),("Ayan","Math",70)).toDF("Name","Subject","Marks")
df.show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics| 80|
//|Sham|English| 90|
//|Ayan| Math| 70|
//+----+-------+-----+
import org.apache.spark.sql.functions._
val x:String = "Marks"
df.withColumn(x, when(col(s"${x}") > 75, col(s"${x}") + 10).otherwise(col(s"${x}"))).show()
//+----+-------+-----+
//|Name|Subject|Marks|
//+----+-------+-----+
//| Ram|Physics| 90|
//|Sham|English| 100|
//|Ayan| Math| 70|
//+----+-------+-----+
使用functions.col
如下 -
df1.show(false)
/**
* +----+-------+-----+
* |Name|Subject|Marks|
* +----+-------+-----+
* |Ram |Physics|80 |
* |Sham|English|90 |
* |Ayan|Math |70 |
* +----+-------+-----+
*/
val x = "Marks"
// use functions.col
df1.withColumn(x, when(col(x) > 75, col(x) + 10).otherwise(col(x)))
.show()
/**
* +----+-------+-----+
* |Name|Subject|Marks|
* +----+-------+-----+
* | Ram|Physics| 90|
* |Sham|English| 100|
* |Ayan| Math| 70|
* +----+-------+-----+
*/
為了更好地理解,我將列分隔為requiredColumns
和allColumns
。
檢查下面的代碼。
scala> df.show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|80 |
|Sham|English|90 |
|Ayan|Math |70 |
+----+-------+-----+
scala> val requiredColumns = Set("Marks")
requiredColumns: scala.collection.immutable.Set[String] = Set(Marks)
scala> val allColumns = df.columns
allColumns: Array[String] = Array(Name, Subject, Marks)
scala>
val columnExpr = allColumns
.filterNot(requiredColumn(_))
.map(col(_)) ++ requiredColumns
.map(c => when(col(c) > 75,col(c) + 10).otherwise(col(c)).as(c))
Output
scala> df.select(columnExpr:_*).show(false)
+----+-------+-----+
|Name|Subject|Marks|
+----+-------+-----+
|Ram |Physics|90 |
|Sham|English|100 |
|Ayan|Math |70 |
+----+-------+-----+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.