[英]SPARK SQL select from group by select result
我有一個名為 table_new 的表。 在第一步中,我想按 id、kmstand、vacationname 和 leavevalue 對結果進行分組,其中每個分組只存在一個計數。 對於這一步,我已經創建了一個查詢:
SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC
結果是:
id kmstand vacationame vacationvalue
1 1 4000 vacation1 munich
2 1 4000 vacation1 stuttgart
3 1 5500 vacation4 koln
4 1 5500 vacation2 frankfurt
5 1 5500 vacation3 berlin
6 1 5500 vacation1 potsdam
7 2 6000 vacation2 new york
8 2 6000 vacation1 bangladesh
9 2 3000 vacation1 washington
10 2 3000 vacation3 chicago
現在,我想選擇kmstand 和vacationname 組合現在不同的id。 這意味着結果應該是:
id kmstand vacationame vacationvalue
1 1 5500 vacation4 koln
2 1 5500 vacation2 frankfurt
3 1 5500 vacation3 berlin
4 1 5500 vacation1 potsdam
5 2 6000 vacation2 new york
6 2 6000 vacation1 bangladesh
7 2 3000 vacation1 washington
8 2 3000 vacation3 chicago
為此,我創建了以下嵌套的 sql 查詢:
SELECT id, kmstand, count(*) as cnt
FROM `db_1`.`table_new`
WHERE (SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC)
GROUP BY id, kmstand
HAVING cnt = 1
ORDER BY id, kmstand DESC
我在沒有 where 子句或沒有 from 的情況下也嘗試過,但沒有找到解決方案。 對於此 SQL 查詢,我收到以下錯誤消息: org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'id' ',' in expression specification; line 3 pos 7
org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'id' ',' in expression specification; line 3 pos 7
我很確定,語法不正確。 你有什么建議嗎?
這是問題的答案。 現在在這里我能夠得到組合 id、kmstand 和vacationame 不同的 id。
SELECT id, sumcnt, cnt2
FROM(
SELECT id, count(*) as cnt2, sum(cnt) as sumcnt
FROM (
SELECT id, kmstand, vacationame, count(*) as cnt
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame)T
GROUP BY id)T
WHERE (sumcnt/cnt2 = 1)
不熟悉 SPARK,但您可能想要:
SELECT id, kmstand, count(*) as cnt
FROM (SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1) T
GROUP BY id, kmstand
HAVING cnt = 1
ORDER BY id, kmstand DESC
請注意,我在 FROM 子句中為表添加了一個別名 ( T
)。 這可能需要也可能不需要,具體取決於您的 RDBMS。
另請注意,您通常不能在子查詢中使用 ORDER BY。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.