簡體   English   中英

Scala 查詢需要

[英]Scala query need

您好,我收到以下代碼的錯誤。

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
import spark.implicits._


// Define case classe for input data
case class Article(articleId: Int, title: String, url: String, publisher: String,
                   category: String, storyId: String, hostname: String, timestamp: String)
// Read the input data
val articles = spark.read.
  schema(Encoders.product[Article].schema).
  option("delimiter", ",").
  csv("hdfs:///user/ashhall1616/bdc_data/t4/news-small.csv").
  as[Article]

articles.createOrReplaceTempView("articles")

val writeDf = spark.sql("""SELECT articles.storyId AS storyId1, articles.publisher AS publisher1 
FROM articles
GROUP BY storyId
ORDER BY publisher1 ASC""")

錯誤:

val writeDf = spark.sql("""SELECT articles.storyId AS storyId1, articles.publisher AS publisher1 
     | FROM articles
     | GROUP BY storyId
     | ORDER BY publisher1 ASC""")
org.apache.spark.sql.AnalysisException: expression 'articles.`publisher`' is neither present in the group by, nor is it an aggregate function. Add to group by or w
rap in first() (or first_value) if you don't care which value you get.;;
Sort [publisher1#36 ASC NULLS FIRST], true
+- Aggregate [storyId#13], [storyId#13 AS storyId1#35, publisher#11 AS publisher1#36]
   +- SubqueryAlias articles
      +- Relation[articleId#8,title#9,url#10,publisher#11,category#12,storyId#13,hostname#14,timestamp#15] csv

數據集看起來像:

articleId 發布者類別 storyId 主機名

1 | 洛杉磯時報 | 乙 | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.latimes.com

目標是為每個故事創建一個列表,並與至少為該故事撰寫一篇文章的每個出版商配對。

[ddUyU0VZz0BRneMioxUPQVP6sIxvM,Livemint]

[ddUyU0VZz0BRneMioxUPQVP6sIxvM,IFA 雜志]

[ddUyU0VZz0BRneMioxUPQVP6sIxvM, Moneynews]

[ddUyU0VZz0BRneMioxUPQVP6sIxvM,納斯達克]

[dPhGU51DcrolUIMxbRm0InaHGA2XM,IFA 雜志]

[ddUyU0VZz0BRneMioxUPQVP6sIxvM,洛杉磯時報]

[dPhGU51DcrolUIMxbRm0InaHGA2XM,納斯達克]

有人可以建議改進代碼以獲得所需的 output 嗎?

解析器,編譯器變得困惑。

您沒有與 GROUP BY 的聚合。 在 storyid、發布者上使用 DISTINCT。

檢查您是否還需要 GROUP BY 上的 storyId1。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM