[英]Issue with running Spark SQL query - column not found
我正在研究 Spark SQL,並使用以下代碼創建了一個名為 car4 的數據框:
scala> val cars4 = spark.sql("SELECT maker, model, round(avg(mileage),0) avg_mileage, round(avg(price_eur),0) avg_price FROM cars_make_model_avgmileage_avgprice GROUP BY maker, model ORDER BY maker ASC, model ASC")
看起來像這樣:
cars4.show(30)
然后我創建一個視圖:
cars4.createOrReplaceTempView("cars_make_model_mileage_price_ratio")
然后,當我嘗試使用以下代碼從上述數據框中獲取 avg_mileage 和 avg_price 的划分時:
val cars5 = spark.sql("SELECT maker, model, round(avg_mileage/avg_price,0) mileage_price_ratio FROM cars_make_model_mileage_price_ratio GROUP BY maker, model ORDER BY mileage_price_ratio ASC")
我收到以下錯誤:
我已經檢查過,數據框cars4 具有以下列:
那為什么在報錯的截圖中說找不到avg_mileage呢? 有任何想法嗎?
在這個查詢中:
SELECT maker,
model,
round(avg_mileage/avg_price,0)
mileage_price_ratio
FROM cars_make_model_mileage_price_ratio
GROUP BY maker, model
ORDER BY mileage_price_ratio ASC
您正在對制造商和模型(col 1 和 col 2)進行分組,但是您沒有對 col3 和 col4 執行任何聚合,這導致了此錯誤:
對於聚合選項,請查看 - https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.agg
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.