[英]BigQuery join of three tables
I am trying to join three tables in BigQuery;我正在尝试在 BigQuery 中加入三个表; table 1 has records of one event (ie each row is one record), table 2 has records of a second event, and table 3 has category names.表1有一个事件的记录(即每一行是一个记录),表2有第二个事件的记录,表3有类别名称。
I want to produce a final table that has counts for table 1 and table 2 by category and device platform.我想生成一个最终表,其中按类别和设备平台对表 1 和表 2 进行计数。 However, every time I run this I get an error that says joined.t3.category is not a field of either table in the join .但是,每次我运行它时,我都会收到一条错误消息,指出join.t3.category 不是 join 中任一表的字段。
Here's my current code:这是我当前的代码:
Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category
from
(
SELECT
Id,
Origin,
CatId
FROM [testing.table_1] as t1
JOIN (SELECT category,
CategoryID
FROM [testing.table_3]) as t3
on t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id,
CategoryId
FROM [testing.table_2]) as t2
ON (joined.t1.CatId = t2.CategoryId)
Group by platform,category;
For reference, here's a simpler join between tables 1 and 2 that works perfectly:作为参考,这是表 1 和表 2 之间的一个更简单的连接,效果很好:
Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform
from testing.table_1 as t1
JOIN testing.table_2 as t2
on t1.CatId = t2.CategoryId
Group by platform;
Can you try using standard SQL for your query instead?您可以尝试使用标准 SQL进行查询吗? It has better handling of aliases, and COUNT(DISTINCT ...)
will give you an exact result rather than an approximation as in legacy SQL.它可以更好地处理别名,并且COUNT(DISTINCT ...)
将为您提供准确的结果,而不是像旧 SQL 中的近似值。 If it helps, the only change you should need to make to your query is to use backticks to escape your table names rather than brackets.如果有帮助,您应该对查询进行的唯一更改是使用反引号来转义您的表名而不是括号。 For example:例如:
SELECT
COUNT(DISTINCT joined.t1.Id) as t1_events,
COUNT(DISTINCT t2.Id) as t2_events,
joined.t1.Origin as platform,
joined.t3.category as category
FROM (
SELECT
Id,
Origin,
CatId
FROM `testing.table_1` AS t1
JOIN (
SELECT
category,
CategoryID
FROM `testing.table_3`
) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (
SELECT
Id,
CategoryId
FROM `testing.table_2`
) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category;
The simple fix is to add category
field in first inner SELECT
- otherwise it is not visible to outermost SELECT
- thus the error!简单的解决方法是在第一个内部SELECT
添加category
字段 - 否则它对最外面的SELECT
不可见 - 因此错误! That was the issue!那就是问题!
Also, in BigQuery Legacy SQL you can use EXACT_COUNT_DISTINCT otherwise you get statistical approximation - see more inCOUNT([DISTINCT])此外,在 BigQuery Legacy SQL 中,您可以使用EXACT_COUNT_DISTINCT否则您将获得统计近似值 - 在COUNT([DISTINCT]) 中查看更多信息
So, for Legacy SQL your query can look like:因此,对于 Legacy SQL,您的查询可能如下所示:
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Moreover, I feel like you can simplify it further (assuming there will be no ambiguous fields)而且,我觉得你可以进一步简化它(假设不会有歧义的字段)
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN [testing.table_3] AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Of course you will need to do same fix if you will use Standard SQL version of it (as Elliott has suggested:当然,如果您将使用它的标准 SQL 版本,则需要进行相同的修复(正如 Elliott 所建议的:
SELECT
COUNT(DISTINCT joined.t1.Id) AS t1_events,
COUNT(DISTINCT t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM `testing.table_1` AS t1
JOIN `testing.table_3` AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
I don't know the google-bigquery but my SQL knowledge says me that having two aliases before the column name causes a problem.我不知道 google-bigquery,但我的 SQL 知识告诉我,在列名之前有两个别名会导致问题。 Try to remove t
-aliases after the joined
one, for example use joined.category
instead of joined.t3.category
.尝试在joined
后删除t
别名,例如使用joined.category
而不是joined.t3.category
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.