简体   繁体   English

使用 SQL 连接单个表中的对数

[英]Join number of pairs in a single table using SQL

I have two tables of events in bigquery that look like as follows.我在 bigquery 中有两个事件表,如下所示。 The main idea is two count the number of events in each table (are always pairs of event_id and user_id) and join them in a single table that for each pair in any table it tells the number of events.主要思想是计算每个表中的事件数(总是成对的 event_id 和 user_id)并将它们连接到一个表中,对于任何表中的每一对,它都会告诉事件的数量。

table 1:表格1:

| event_id | user id |
| -------- | ------- |
| 1        | 1       |
| 2        | 1       |
| 2        | 3       |
| 2        | 5       |
| 1        | 1       |
| 4        | 7       |

table 2:表2:

| event_id | user id |
| -------- | ------- |
| 1        | 1       |
| 3        | 1       |
| 2        | 3       |

I would like to get a table which has the number of events of each table:我想得到一个表,其中包含每个表的事件数:

| event_id | user id | num_events_table1 | num_events_table2 |
| -------- | ------- | ----------------- | ----------------- |
| 1        | 1       | 2                 | 1                 |
| 2        | 1       | 1                 | 0                 |
| 2        | 3       | 1                 | 1                 |
| 2        | 5       | 1                 | 0                 |
| 4        | 7       | 1                 | 0                 |
| 3        | 1       | 0                 | 1                 |

Any idea of how to do this with sql?知道如何用 sql 做到这一点吗? I have tried this:我试过这个:

SELECT i1, e1, num_viewed, num_displayed FROM
(SELECT id as i1, event as e1, count(*) as num_viewed
FROM table_1
group by id, event) a
full outer JOIN (SELECT id as i2, event as e2, count(*) as num_displayed
FROM table_2
group by id, event) b
on a.i1 = b.i2 and a.e1 = b.e2

This is not getting exactly what I want.这并没有得到我想要的。 I amb getting i1 which are null and e1 that are null.我得到的是 null 的 i1 和 null 的 e1。

Consider below考虑下面

#standardSQL
with `project.dataset.table1` as (
  select 1 event_id, 1 user_id union all
  select 2, 1 union all
  select 2, 3 union all
  select 2, 5 union all
  select 1, 1 union all
  select 4, 7 
), `project.dataset.table2` as (
  select 1 event_id, 1 user_id union all
  select 3, 1 union all
  select 2, 3 
)    
select event_id, user_id,
  countif(source = 1) as num_events_table1,
  countif(source = 2) as num_events_table2
from (
  select 1 source, * from `project.dataset.table1`
  union all 
  select 2, * from `project.dataset.table2`
)
group by event_id, user_id  

  

if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

If I understand correctly, the simplest method is to modify your query via a USING clause along with COALESCE() :如果我理解正确,最简单的方法是通过USING子句和COALESCE()修改您的查询:

SELECT id, event, COALESCE(num_viewed, 0), COALESCE(num_displayed, 0)
FROM (SELECT id, event, count(*) as num_viewed
      FROM table_1
      GROUP BY id, event
     ) t1 FULL JOIN
     (SELECT id , event, COUNT(*) as num_displayed
      FROM table_2
      GROUP BY id, event
     ) t2
     USING (id, event);

Note: This requires that the two columns used for the JOIN have the same name.注意:这要求用于JOIN的两列具有相同的名称。 If this is not the case, then you might still need column aliases in the subqueries.如果不是这种情况,那么您可能仍需要子查询中的列别名。

One way is aggregate the union一种方法是聚合工会

select event_id, user id, sum(cnt1) cnt1, sum(cnt2) cnt2
from (
    select event_id, user id, 1 cnt1, 0 cnt2
    from table_1
    union all
    select event_id, user id, 0 cnt1, 1 cnt2
    from table_2 ) t
group by event_id, user id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM