[英]SQL - GROUP BY 3 values of the same column
I have this table in GBQ:我在 GBQ 中有这张表:
ClientID Type Month
XXX A 4
YYY C 4
FFX B 5
FFF B 6
XXX C 6
XXX A 6
YRE C 7
AAR A 7
FFF A 8
EGT B 8
FFF B 9
ETT C 9
I am counting the number of Type per ClientID and Month , with this basic query:我正在计算每个ClientID和Month的 Type 数量,使用这个基本查询:
SELECT ClientID,
COUNT(DISTINCT Type) NbTypes,
Month
FROM Table
GROUP BY ClientID, Month
The result looks like this:结果如下所示:
ClientID NbTypes Month
XXX 1 4
XXX 2 6
FFF 1 6
FFF 1 8
FFF 1 9
... ... ...
What I need to do is, count the number of Type per ClientID and for each Month: per the last 3 months.我需要做的是,计算每个 ClientID 和每个月的类型数量:最近 3 个月。
For example:例如:
ClientID
= XXX, and Month
= 8 : I want to have the count of Type
where Month
= 6 AND Month
= 7 AND Month
= 8ClientID
= XXX 和Month
= 8 :我想要Type
的计数,其中Month
= 6 AND Month
= 7 AND Month
= 8 Is there a way to do this with GROUP BY
?有没有办法用
GROUP BY
做到这一点?
Thank you谢谢
You can use a SELECT in a SELECT if that is allowed in Google Big Query如果 Google Big Query 允许,您可以在 SELECT 中使用 SELECT
SELECT ClientID,
COUNT(DISTINCT Type) NbTypes,
Month,
MAX((select count(distinct Type)
from Table t2
where t1.ClientID=t2.ClientID
and t1.month-t2.month between 0 and 3
)
) as NbType_3_months
FROM Table t1
GROUP BY ClientID, Month
You could use HAVING in your statement:您可以在声明中使用 HAVING:
SELECT ClientID,
COUNT(DISTINCT Type) NbTypes,
Month
FROM Table
GROUP BY ClientID, Month
HAVING Month = EXTRACT(MONTH FROM CURRENT_DATE())
OR Month = EXTRACT(MONTH FROM DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 1 MONTH))
OR Month = EXTRACT(MONTH FROM DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), INTERVAL 2 MONTH))
Note that in your table seems to be no column to determinate the year, so this statement will group all values with month value of the current month to current month minus two months.请注意,在您的表中似乎没有确定年份的列,因此该语句会将所有值与当前月份的月份值分组到当前月份减去两个月。 So for example every data from December, November and October 2021, 2020, 2019 etc. will be selected with this query.
因此,例如,该查询将选择 12 月、11 月和 2021 年 10 月、2020 年、2019 年等的所有数据。
Also note that I could not test this statement, since I don't use BigQuery.另请注意,我无法测试此语句,因为我不使用 BigQuery。
Here is the source for the Date-Functions: https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions这是日期函数的来源: https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions
You can group rows by ClientID and Month, count the number of types and sort rows by ClientID in ascending order and by Month in descending order, and then select from each group the rows of the past three months.可以按ClientID和Month对行进行分组,统计类型数,按ClientID升序排列,按Month降序排列,然后每组select最近三个月的行。 It is roundabout and complicated to handle such a scenario in SQL because SQL implements set-orientation only halfway up.
在 SQL 中处理这种情况是迂回和复杂的,因为 SQL 仅在中途实现集合定向。 For your case, you have to get the largest month for each ClientID, find the eligible records through a join filter, and perform grouping and count.
对于您的情况,您必须为每个 ClientID 获取最大的月份,通过连接过滤器找到符合条件的记录,然后进行分组和计数。 The usual way is to fetch the original data out of the database and process it in Python or SPL.
通常的做法是从数据库中取出原始数据,在Python或者SPL中进行处理。 SPL, the open-source Java package, is easier to be integrated into a Java program and generate much simpler code.
SPL,开源 Java package,更容易集成到 Java 程序中,生成更简单的代码。 It gets the task done with only two lines of code:
它只用两行代码就完成了任务:
A![]() |
|
---|---|
1 ![]() |
=GBQ.query("SELECT CLIENTID, COUNT(DISTINCT TYPE) AS NBTYPES, MONTH FROM t2 GROUP BY CLIENTID, MONTH ORDER BY CLIENTID, MONTH DESC") ![]() |
2 ![]() |
=A1.group@o(#1).run(m=~.#3-3,~=~.select(MONTH>m)).conj() ![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.