![](/img/trans.png)
[英]SQL left join where the joined table only displays the row column with lowest figure
[英]SQL - join two tables, but get average of the joined column only by left table occurrences
我希望聯接兩個表,但是僅通過左表出現次數來獲得聯接列的平均值
文獻:
+-----+-----+-------+
| dId | name| score |
+-----+-----+-------+
| A | n1 | 100 |
| B | n1 | 70 |
+-----+-----+-------+
實體:
+------+------------+-----+
| ename| details | dId |
+------+------------+-----+
| e1 | a | A |
| e2 | a | A |
| e3 | b | A |
| e4 | c | B |
+------+------------+-----+
預期產量:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 85 |e1, e2, e3, e4 |
+------+--------+---------------+
因為(100 + 70)/ 2 = 85
電流輸出:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 92.5 |e1, e2, e3, e4 |
+------+--------+---------------+
因為(100 + 100 + 100 + 70)/ 4 = 92.5
當前查詢:
SELECT
docT.name,
AVG(docT.score),
STRING_AGG(entityT.ename)
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname
如何獲得預期輸出中的分數?
試試下面的代碼
select name, ename, avg(score) as score
from (SELECT
docT.name,
doct.Did,
MAX(docT.score) as score,
STRING_AGG(entityT.ename) as ename
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname, doct.Did
) sub
group by name, ename
嘗試這個
select t.name, av,
GROUP_CONCAT(DISTINCT entityT.name ORDER BY entityT.name SEPARATOR ', ') AS entities
from (
SELECT docT.dId, docT.name,
AVG(docT.score) av
FROM document_sentiment docT
GROUP BY docT.name) T
JOIN entity_sentiment entityT ON T.dId = entityT.dId
GROUP BY T.name
以下是BigQuery標准SQL
#standardSQL
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
您可以使用問題中的示例數據進行上述測試和操作,如下例所示
#standardSQL
WITH `project.dataset.document_sentiment` AS (
SELECT 'A' dId, 'n1' name, 100 score UNION ALL
SELECT 'B', 'n1', 70
), `project.dataset.entity_sentiment` AS (
SELECT 'e1' ename, 'a' details, 'A' dId UNION ALL
SELECT 'e2', 'a', 'A' UNION ALL
SELECT 'e3', 'b', 'A' UNION ALL
SELECT 'e4', 'c', 'B'
)
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
Row name average entities
1 n1 85.0 e1,e2,e3,e4
這很棘手。 我認為窗口函數可能是最簡單的解決方案:
SELECT docT.name, docT.avg_score,
STRING_AGG(entityT.ename)
FROM (SELECT docT.*,
AVG(docT.score) OVER (PARTITION BY docT.cname) as avg_score
FROM document_sentiment docT
) docT JOIN
entity_sentiment entityT
ON docT.dId = entityT.dId
GROUP BY docT.cname, docT.avg_score;
為什么這很棘手? 好吧,如果按cname
聚合,那么您將失去dId
且無法執行JOIN
。
預聚合並不能解決問題。 幸運的是,這是使用窗口函數解決的。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.