簡體   English   中英

在 Spark 中執行聚合函數時出錯:ArrayType 無法轉換為 org.apache.spark.sql.types.StructType

[英]Error while performing aggregate functions in Spark: ArrayType cannot be cast to org.apache.spark.sql.types.StructType

我正在從包含 gps 數據的 json 創建 Spark DF。 當我嘗試計算列的平均值時,出現以下錯誤:

Py4JJavaError: An error occurred while calling o470.collectToPython.
: java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType

我不明白這個錯誤,因為我沒有 ArrayType。 這是我的架構:

root
 |-- LastUpdateData: string (nullable = true)
 |-- DataGenerated: string (nullable = true)
 |-- Delay: long (nullable = true)
 |-- GPSQuality: long (nullable = true)
 |-- Lat: double (nullable = true)
 |-- Line: string (nullable = true)
 |-- Lon: double (nullable = true)
 |-- Route: string (nullable = true)
 |-- Speed: long (nullable = true)
 |-- VehicleCode: string (nullable = true)
 |-- VehicleId: long (nullable = true)
 |-- VehicleService: string (nullable = true)
StructType(List(StructField(LastUpdateData,StringType,true),StructField(DataGenerated,StringType,true),StructField(Delay,LongType,true),StructField(GPSQuality,LongType,true),StructField(Lat,DoubleType,true),StructField(Line,StringType,true),StructField(Lon,DoubleType,true),StructField(Route,StringType,true),StructField(Speed,LongType,true),StructField(VehicleCode,StringType,true),StructField(VehicleId,LongType,true),StructField(VehicleService,StringType,true)))

這是我的代碼:

df.agg({"Delay": "avg"}).collect()

試試下面的。

from pyspark.sql import functions

#returns the average value in the Delay Column
delay_df = df.agg(functions.avg("Delay"))

#view the output
delay_df.show()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM