Spark（Scala）：如何將Array [Row]轉換為DataSet [Row]或DataFrame？

Question

我有一個Array [Row]，我想將其轉換為Dataset[Row]或DataFrame 。

我如何提出一個行數組？

好吧，我正在嘗試從數據集中清除null：

無需過濾EACH列（我有很多）和..
不使用.na.drop()函數從DataFrameNaFunctions ，因為它不能檢測時，電池居然有字符串"null" 。

因此，我想出了以下行來過濾所有列中的null 。

val outDF = inputDF.columns.flatMap { col => inputDF.filter(col + "!='' AND " + col + "!='null'").collect() }

問題是，outDF是Array[Row] ，因此是一個問題！ 任何想法歡迎！

Answer 1

這是您的代碼可以正常工作的方式：

inputDF.columns.map {
  col => inputDF.filter((inputDF(col) =!= "") and (inputDF(col) =!= "null"))
}.reduce(_ union _)

像這樣：

inputDF.where(inputDF.columns.map {
  col => (inputDF(col) =!= "") and (inputDF(col) =!= "null")
}.foldLeft(lit(true))(_ and _))

是你想要的。

請注意，第一個解決方案創建了非排他的子集，因此具有如下數據：

val inputDF = Seq(("1", "a"), ("2", ""), ("null", "")).toDF

結果將是：

+---+---+
| _1| _2|
+---+---+
|  1|  a|
|  2|   |
|  1|  a|
+---+---+

對於解決方案，我認為是正確的：

+---+---+
| _1| _2|
+---+---+
|  1|  a|
+---+---+

Answer 2

我根據我的評論發布答案。

df.na.drop(df.columns).where("'null' not in ("+df.columns.mkString(",")+")")

Answer 3

根據Srinivas先生的評論，通過使用以下代碼來回答此問題：

//First drop all typical nulls
val prelimDF = inputDF.na.drop()

//Then drops all columns actually saying 'null'
val finalDF = prelimDF.na.drop(prelimDF.columns).where("'null' not in ("+prelimDF.columns.mkString(",")+")")

干杯!

Spark（Scala）：如何將Array [Row]轉換為DataSet [Row]或DataFrame？

問題描述

3 個解決方案

解決方案1
3 2016-11-21 06:47:38

解決方案2
3 已采納 2016-11-21 07:01:53

解決方案3
0 2016-11-21 06:58:24

Spark（Scala）：如何將Array [Row]轉換為DataSet [Row]或DataFrame？

問題描述

3 個解決方案

解決方案1 3 2016-11-21 06:47:38

解決方案2 3 已采納 2016-11-21 07:01:53

解決方案3 0 2016-11-21 06:58:24

解決方案1
3 2016-11-21 06:47:38

解決方案2
3 已采納 2016-11-21 07:01:53

解決方案3
0 2016-11-21 06:58:24