SQL 联接：具有累积条件的双方所有值（Presto/AWS Athena）

Question

I've been looking at this seemingly simple problem for a while with no solution, assume I have a table with a list of dates, and another table with phone numbers and people and dates, I need to have a final result which has all names and all dates, with a third column that has the number of unique phone numbers appeared in any date that is the same or larger than the date in the result, this is an example:我一直在研究这个看似简单的问题，但没有解决方案，假设我有一个包含日期列表的表格，另一个包含电话号码、人员和日期的表格，我需要一个包含所有名称的最终结果和所有日期，第三列中出现的唯一电话号码的数量与结果中的日期相同或更大，这是一个示例：

t1
+------------+
|    date    |
+------------+
| 01/01/2020 |
| 01/02/2020 |
| 01/03/2020 |
| 01/04/2020 |
| 01/05/2020 |
| 01/06/2020 |
| 01/07/2020 |
| 01/08/2020 |
+------------+

t2
+------+------------+--------------+
| name |    date    | phone_number |
+------+------------+--------------+
| John | 01/01/2020 |          123 |
| Mike | 01/02/2020 |          456 |
| Mike | 01/03/2020 |          789 |
| John | 01/04/2020 |          999 |
| Mike | 01/05/2020 |          111 |
| John | 01/06/2020 |          777 |
| Mike | 01/07/2020 |          123 |
| Mike | 01/08/2020 |          456 |
| John | 01/01/2020 |          789 |
| John | 01/02/2020 |          789 |
| Mike | 01/03/2020 |          789 |
| John | 01/04/2020 |          789 |
+------+------------+--------------+

The result I am aiming for:我的目标是：

+------+------------+-----------------------------------------------------------------+
| Name |   Month    | Comulative Unique Numbers (Unique Numbers in any date >= Month) |
+------+------------+-----------------------------------------------------------------+
| John | 01/01/2020 |                                                               4 |
| John | 01/02/2020 |                                                               3 |
| John | 01/03/2020 |                                                               3 |
| John | 01/04/2020 |                                                               3 |
| John | 01/05/2020 |                                                               1 |
| John | 01/06/2020 |                                                               1 |
| John | 01/07/2020 |                                                               0 |
| John | 01/08/2020 |                                                               0 |
| Mike | 01/01/2020 |                                                               4 |
| Mike | 01/02/2020 |                                                               4 |
| Mike | 01/03/2020 |                                                               4 |
| Mike | 01/04/2020 |                                                               3 |
| Mike | 01/05/2020 |                                                               3 |
| Mike | 01/06/2020 |                                                               2 |
| Mike | 01/07/2020 |                                                               2 |
| Mike | 01/08/2020 |                                                               1 |
+------+------------+-----------------------------------------------------------------+

I tried so many ways, and this is what I thought the closest:我尝试了很多方法，这是我认为最接近的方法：

SELECT * FROM t1
LEFT OUTER JOIN
(SELECT t1.date, COUNT(DISTINCT phone_number) count, name FROM t1
LEFT OUTER JOIN
t2
ON t1.date < t2.date
GROUP BY t1.date,t2.name
ORDER BY 2 DESC) temp
ON t1.date = temp.date

I still get missing rows from the final result.我仍然从最终结果中得到缺失的行。

This is what I am getting:这就是我得到的：

+------+------------+-------+
| name |    date    | count |
+------+------------+-------+
| null | 2020-08-01 |     0 |
| John | 2020-01-01 |     3 |
| John | 2020-02-01 |     3 |
| John | 2020-03-01 |     3 |
| John | 2020-04-01 |     1 |
| John | 2020-05-01 |     1 |
| Mike | 2020-01-01 |     4 |
| Mike | 2020-02-01 |     4 |
| Mike | 2020-03-01 |     3 |
| Mike | 2020-04-01 |     3 |
| Mike | 2020-05-01 |     2 |
| Mike | 2020-06-01 |     2 |
| Mike | 2020-07-01 |     1 |
+------+------------+-------+

Answer 1

Using a calendar table approach, we can build a reference table consisting of all names along with all dates.使用日历表方法，我们可以构建一个包含所有名称和所有日期的参考表。 Then, left join this to your second table which contains the actual data:然后，将其加入到包含实际数据的第二个表中：

SELECT
    b.name,
    a.date,
    COUNT(DISTINCT t.phone_number) AS unique_numbers
FROM t1 a
CROSS JOIN (SELECT DISTINCT name FROM t2) b
LEFT JOIN t2 t
    ON a.date = t.date AND b.name = t.name
GROUP BY
    b.name,
    a.date
ORDER BY
    b.name,
    a.date;

SQL 联接：具有累积条件的双方所有值（Presto/AWS Athena）

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-11 04:50:24

SQL 联接：具有累积条件的双方所有值（Presto/AWS Athena）

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-11 04:50:24

解决方案1
2 已采纳 2020-05-11 04:50:24