![](/img/trans.png)
[英]SQL left join where the joined table only displays the row column with lowest figure
[英]SQL - join two tables, but get average of the joined column only by left table occurrences
我希望联接两个表,但是仅通过左表出现次数来获得联接列的平均值
文献:
+-----+-----+-------+
| dId | name| score |
+-----+-----+-------+
| A | n1 | 100 |
| B | n1 | 70 |
+-----+-----+-------+
实体:
+------+------------+-----+
| ename| details | dId |
+------+------------+-----+
| e1 | a | A |
| e2 | a | A |
| e3 | b | A |
| e4 | c | B |
+------+------------+-----+
预期产量:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 85 |e1, e2, e3, e4 |
+------+--------+---------------+
因为(100 + 70)/ 2 = 85
电流输出:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 92.5 |e1, e2, e3, e4 |
+------+--------+---------------+
因为(100 + 100 + 100 + 70)/ 4 = 92.5
当前查询:
SELECT
docT.name,
AVG(docT.score),
STRING_AGG(entityT.ename)
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname
如何获得预期输出中的分数?
试试下面的代码
select name, ename, avg(score) as score
from (SELECT
docT.name,
doct.Did,
MAX(docT.score) as score,
STRING_AGG(entityT.ename) as ename
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname, doct.Did
) sub
group by name, ename
尝试这个
select t.name, av,
GROUP_CONCAT(DISTINCT entityT.name ORDER BY entityT.name SEPARATOR ', ') AS entities
from (
SELECT docT.dId, docT.name,
AVG(docT.score) av
FROM document_sentiment docT
GROUP BY docT.name) T
JOIN entity_sentiment entityT ON T.dId = entityT.dId
GROUP BY T.name
以下是BigQuery标准SQL
#standardSQL
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
您可以使用问题中的示例数据进行上述测试和操作,如下例所示
#standardSQL
WITH `project.dataset.document_sentiment` AS (
SELECT 'A' dId, 'n1' name, 100 score UNION ALL
SELECT 'B', 'n1', 70
), `project.dataset.entity_sentiment` AS (
SELECT 'e1' ename, 'a' details, 'A' dId UNION ALL
SELECT 'e2', 'a', 'A' UNION ALL
SELECT 'e3', 'b', 'A' UNION ALL
SELECT 'e4', 'c', 'B'
)
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
Row name average entities
1 n1 85.0 e1,e2,e3,e4
这很棘手。 我认为窗口函数可能是最简单的解决方案:
SELECT docT.name, docT.avg_score,
STRING_AGG(entityT.ename)
FROM (SELECT docT.*,
AVG(docT.score) OVER (PARTITION BY docT.cname) as avg_score
FROM document_sentiment docT
) docT JOIN
entity_sentiment entityT
ON docT.dId = entityT.dId
GROUP BY docT.cname, docT.avg_score;
为什么这很棘手? 好吧,如果按cname
聚合,那么您将失去dId
且无法执行JOIN
。
预聚合并不能解决问题。 幸运的是,这是使用窗口函数解决的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.