[英]Extract Array[T] from Spark dataframe in Scala
我正在嘗試根據具有最小值的另一列來查找特定數組( Double
類型)。 以下代碼可用於提取數組,但我無法將其作為Array[Double]
接收。 嘗試從其他線程中找到的映射和轉換,但無法解決問題。 我將不勝感激任何提示。 下面是插圖:
scala> df.show
+----+---------------+
|time| crds|
+----+---------------+
|12.0|[0.1, 2.1, 1.2]|
| 8.0|[1.1, 2.1, 3.2]|
| 9.0|[1.1, 1.1, 2.2]|
+----+---------------+
scala> val minTime = df.select(min(col("time"))).collect()(0)(0).toString.toDouble
minTime: Double = 8.0
scala> val crd = df.filter($"time" === minTime).select($"crds").take(1)
crd: Array[org.apache.spark.sql.Row] = Array([WrappedArray(1.1, 2.1, 3.2)])
scala> val res: Array[Double] = crd.array
<console>:29: error: type mismatch;
found : Array[org.apache.spark.sql.Row]
required: Array[Double]
val res: Array[Double] = crd.array
^
scala>
可能是一個但很麻煩的工作,但假設只有一次命中最低。
scala> val df = Seq(
| (12.0, Array(0.1, 2.1, 1.2)),
| (8.0, Array(1.1, 2.1, 3.2)),
| (9.0, Array(1.1, 1.1, 2.2))
| ).toDF("time", "crds")
df: org.apache.spark.sql.DataFrame = [time: double, crds: array<double>]
scala> val minTime = df.select(min(col("time"))).collect()(0)(0).toString.toDouble
minTime: Double = 8.0
scala> val crd = df.filter($"time" === minTime).select(explode(col("crds"))).collect().map(i => i(0)).map(_.toString.toDouble)
crd: Array[Double] = Array(1.1, 2.1, 3.2)
scala>
...
import scala.collection.mutable.WrappedArray
val crd = df.filter...select($"v").first.getAs[WrappedArray[Double]](0).toArray
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.