[英]Scala query need
您好,我收到以下代碼的錯誤。
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
import spark.implicits._
// Define case classe for input data
case class Article(articleId: Int, title: String, url: String, publisher: String,
category: String, storyId: String, hostname: String, timestamp: String)
// Read the input data
val articles = spark.read.
schema(Encoders.product[Article].schema).
option("delimiter", ",").
csv("hdfs:///user/ashhall1616/bdc_data/t4/news-small.csv").
as[Article]
articles.createOrReplaceTempView("articles")
val writeDf = spark.sql("""SELECT articles.storyId AS storyId1, articles.publisher AS publisher1
FROM articles
GROUP BY storyId
ORDER BY publisher1 ASC""")
錯誤:
val writeDf = spark.sql("""SELECT articles.storyId AS storyId1, articles.publisher AS publisher1
| FROM articles
| GROUP BY storyId
| ORDER BY publisher1 ASC""")
org.apache.spark.sql.AnalysisException: expression 'articles.`publisher`' is neither present in the group by, nor is it an aggregate function. Add to group by or w
rap in first() (or first_value) if you don't care which value you get.;;
Sort [publisher1#36 ASC NULLS FIRST], true
+- Aggregate [storyId#13], [storyId#13 AS storyId1#35, publisher#11 AS publisher1#36]
+- SubqueryAlias articles
+- Relation[articleId#8,title#9,url#10,publisher#11,category#12,storyId#13,hostname#14,timestamp#15] csv
數據集看起來像:
articleId 發布者類別 storyId 主機名
1 | 洛杉磯時報 | 乙 | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.latimes.com
目標是為每個故事創建一個列表,並與至少為該故事撰寫一篇文章的每個出版商配對。
[ddUyU0VZz0BRneMioxUPQVP6sIxvM,Livemint]
[ddUyU0VZz0BRneMioxUPQVP6sIxvM,IFA 雜志]
[ddUyU0VZz0BRneMioxUPQVP6sIxvM, Moneynews]
[ddUyU0VZz0BRneMioxUPQVP6sIxvM,納斯達克]
[dPhGU51DcrolUIMxbRm0InaHGA2XM,IFA 雜志]
[ddUyU0VZz0BRneMioxUPQVP6sIxvM,洛杉磯時報]
[dPhGU51DcrolUIMxbRm0InaHGA2XM,納斯達克]
有人可以建議改進代碼以獲得所需的 output 嗎?
解析器,編譯器變得困惑。
您沒有與 GROUP BY 的聚合。 在 storyid、發布者上使用 DISTINCT。
檢查您是否還需要 GROUP BY 上的 storyId1。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.