無法執行用戶定義的函數（VectorAssembler

Question

我正在使用 Kmeans 作為聚類算法，我的代碼想要執行並向我顯示此錯誤：

org.apache.spark.SparkException: Failed to execute user defined function(VectorAssembler$$Lambda$1525/671078904: (struct<latitude:double,longitude:double>) => struct<type:tinyint,size:int,indices:array<int>,values:array<double>>)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)

這是數據框代碼：

val st = stations
    .withColumn("longitude", $"longitude".cast(sql.types.DoubleType))
    .withColumn("latitude", $"latitude".cast(sql.types.DoubleType))
val stationVA = new VectorAssembler()
    .setInputCols(Array("latitude","longitude"))
    .setOutputCol("location")
val stationWithLoc =stationVA.transform(st)

println("Assembled columns 'hour', 'mobile', 'userFeatures' to vector column 'location'")
stationWithLoc.select("name", "position").show(false)

stationWithLoc.printSchema()
stationWithLoc.show()

對於 Schema，它可以工作，但如果我放了節目，我就會遇到問題。

Answer 1

對我來說， 問題出在data上 ，我使用的是csv文件，該文件的行中間有新行。 更新后。 通過df.head(1)檢查數據是否正確讀取了所有列。

Answer 2

這個問題很老，但我剛剛在pyspark遇到了這個問題。

我相信錯誤與數據中的空值有關。 在使用VectorAssembler之前對我的列執行fillna()解決了錯誤。

無法執行用戶定義的函數（VectorAssembler

問題描述

2 個解決方案

解決方案1
0 2019-11-25 08:38:06

解決方案2
0 2021-11-04 17:16:48

無法執行用戶定義的函數（VectorAssembler

問題描述

2 個解決方案

解決方案1 0 2019-11-25 08:38:06

解決方案2 0 2021-11-04 17:16:48

解決方案1
0 2019-11-25 08:38:06

解決方案2
0 2021-11-04 17:16:48