简体   繁体   English

三个表的 BigQuery 连接

[英]BigQuery join of three tables

I am trying to join three tables in BigQuery;我正在尝试在 BigQuery 中加入三个表; table 1 has records of one event (ie each row is one record), table 2 has records of a second event, and table 3 has category names.表1有一个事件的记录(即每一行是一个记录),表2有第二个事件的记录,表3有类别名称。

I want to produce a final table that has counts for table 1 and table 2 by category and device platform.我想生成一个最终表,其中按类别和设备平台对表 1 和表 2 进行计数。 However, every time I run this I get an error that says joined.t3.category is not a field of either table in the join .但是,每次我运行它时,我都会收到一条错误消息,指出join.t3.category 不是 join 中任一表的字段

Here's my current code:这是我当前的代码:

Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category

from 

(

SELECT 
        Id,
        Origin,
        CatId

    FROM [testing.table_1] as t1

JOIN (SELECT category,
            CategoryID

FROM [testing.table_3]) as t3

on t1.CatId = t3.CategoryID

) AS joined

JOIN (SELECT Id,
            CategoryId

FROM [testing.table_2]) as t2

ON (joined.t1.CatId = t2.CategoryId)    

Group by platform,category;

For reference, here's a simpler join between tables 1 and 2 that works perfectly:作为参考,这是表 1 和表 2 之间的一个更简单的连接,效果很好:

Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform

from testing.table_1 as t1

JOIN testing.table_2 as t2

on t1.CatId = t2.CategoryId

Group by platform;

Can you try using standard SQL for your query instead?您可以尝试使用标准 SQL进行查询吗? It has better handling of aliases, and COUNT(DISTINCT ...) will give you an exact result rather than an approximation as in legacy SQL.它可以更好地处理别名,并且COUNT(DISTINCT ...)将为您提供准确的结果,而不是像旧 SQL 中的近似值。 If it helps, the only change you should need to make to your query is to use backticks to escape your table names rather than brackets.如果有帮助,您应该对查询进行的唯一更改是使用反引号来转义您的表名而不是括号。 For example:例如:

SELECT
  COUNT(DISTINCT joined.t1.Id) as t1_events,
  COUNT(DISTINCT t2.Id) as t2_events,
  joined.t1.Origin as platform,
  joined.t3.category as category
FROM (
  SELECT 
    Id,
    Origin,
    CatId
  FROM `testing.table_1` AS t1
  JOIN (
    SELECT
      category,
      CategoryID
    FROM `testing.table_3`
  ) AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN (
  SELECT
    Id,
    CategoryId
  FROM `testing.table_2`
) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category;

The simple fix is to add category field in first inner SELECT - otherwise it is not visible to outermost SELECT - thus the error!简单的解决方法是在第一个内部SELECT添加category字段 - 否则它对最外面的SELECT不可见 - 因此错误! That was the issue!那就是问题!

Also, in BigQuery Legacy SQL you can use EXACT_COUNT_DISTINCT otherwise you get statistical approximation - see more inCOUNT([DISTINCT])此外,在 BigQuery Legacy SQL 中,您可以使用EXACT_COUNT_DISTINCT否则您将获得统计近似值 - 在COUNT([DISTINCT]) 中查看更多信息

So, for Legacy SQL your query can look like:因此,对于 Legacy SQL,您的查询可能如下所示:

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

Moreover, I feel like you can simplify it further (assuming there will be no ambiguous fields)而且,我觉得你可以进一步简化它(假设不会有歧义的字段)

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN [testing.table_3] AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

Of course you will need to do same fix if you will use Standard SQL version of it (as Elliott has suggested:当然,如果您将使用它的标准 SQL 版本,则需要进行相同的修复(正如 Elliott 所建议的:

SELECT
  COUNT(DISTINCT joined.t1.Id) AS t1_events,
  COUNT(DISTINCT t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT 
    Id, Origin, CatId, category
  FROM `testing.table_1` AS t1
  JOIN `testing.table_3` AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category 

I don't know the google-bigquery but my SQL knowledge says me that having two aliases before the column name causes a problem.我不知道 google-bigquery,但我的 SQL 知识告诉我,在列名之前有两个别名会导致问题。 Try to remove t -aliases after the joined one, for example use joined.category instead of joined.t3.category .尝试在joined后删除t别名,例如使用joined.category而不是joined.t3.category

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM