如何在 Scala 中使用 Spark 將多維數組轉換為 dataframe？

Question

這是我第一次使用 spark 或 scala，所以我是新手。 我有一個二維數組，我需要將其轉換為 dataframe。 樣本數據是一個長方形（double）形式的連接表，點（a，b）也是雙精度的，一個boolean表示該點是否在矩形內。 我的最終目標是返回一個帶有矩形名稱的 dataframe，以及它在 ST_contains 為真時出現的次數。 由於查詢返回所有為真的實例，我只是試圖按矩形排序（它們被命名為雙精度數）並計算每次出現的次數。 我把它放在一個數組中，然后嘗試將其轉換為數據集。 這是我的一些代碼和我嘗試過的代碼：

// Join two datasets (not my code)
spark.udf.register("ST_Contains",(queryRectangle:String, pointString:String)=>(HotzoneUtils.ST_Contains(queryRectangle, pointString)))
val joinDf = spark.sql("select rectangle._c0 as rectangle, point._c5 as point from rectangle,point where ST_Contains(rectangle._c0,point._c5)")
joinDf.createOrReplaceTempView("joinResult")

// MY CODE
// above join gets a view with rectangle, point, and true. so I need to loop through and count how many for each rectangle
//sort by rectangle asc first
joinDf.orderBy("rectangle")

var a = Array.ofDim[String](1, 2)
for (row <- joinDf.rdd.collect){  
    var count = 1
    var previous_r = -1.0
    
    var r = row.mkString(",").split(",")(0).toDouble
    var p = row.mkString(",").split(",")(1).toDouble
    var c = row.mkString(",").split(",")(2).toDouble
    
    if (previous_r != -1){
        if (previous_r == r){
            //add another to the count
            count = count + 1
        }
        else{
            //stick the result in an array
            a ++= Array(Array(previous_r.toString, count.toString))
        }
    }    
    previous_r = r
}
//create dataframe from array and return it
val df = spark.createDataFrame(a).toDF()

但我不斷收到此錯誤：

推斷類型 arguments [Array[String]] 不符合方法 createDataFrame 的類型參數 bounds [A <: Product] val df = spark.createDataFrame(a).toDF()

我也嘗試過沒有 .toDf() 部分，但仍然沒有運氣。 我在沒有 createDataFrame 命令和只有 .toDf 的情況下嘗試了它，但這也不起作用。

Answer 1

這里有幾件事：

createDataFrame有多種變體，您最終嘗試的可能是：

    def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame

Array[String] is no Seq[A <: Product] : String不是Product 。

我能想到的最快方法是將 go 轉換為Seq ，然后轉換為DataFrame ：

import spark.implicits._

Array("some string")
  .toSeq
  .toDF

或將Array[String]並行化為RDD[String] ，然后創建DataFrame 。

第二個toDF()沒有值， createDataFrame已經返回DataFrame （如果有效）。

如何在 Scala 中使用 Spark 將多維數組轉換為 dataframe？

問題描述

1 個解決方案

解決方案1
0 2021-12-05 13:14:34

如何在 Scala 中使用 Spark 將多維數組轉換為 dataframe？

問題描述

1 個解決方案

解決方案1 0 2021-12-05 13:14:34

解決方案1
0 2021-12-05 13:14:34