简体   繁体   English

使用 () OVER 或 HAVING 子句获取每月计数汇总

[英]Using () OVER or HAVING clause to get monthly aggregates of counts

I have a big dataset on ticket sales throughout a single year.我有一个关于一年中门票销售的大数据集。 The schema I am working with is:我正在使用的架构是:

ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)   
ticket_price (float)
total_price (float)

I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:我试图获得一年中每个月售出票数最多的工作日,例如,output 将是:

year month weekday工作日 total_tickets总票数
2015 2015 01 01 SAT SAT 5400 5400
2015 2015 02 02 SUN太阳 4300 4300
2015 2015 03 03 SUN太阳 6400 6400

I tried using the following, but admittedly SQL is not my strongest skill:我尝试使用以下,但不可否认 SQL 不是我最强的技能:

SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
      EXTRACT(MONTH FROM date_time_sale) AS MONTH,
      week_day,
      RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
      from ticket_sales
      order by YEAR, MONTH

But I keep running into errors.但我一直遇到错误。 I tried using a HAVING clause, but I coludn't go anywhere.我尝试使用 HAVING 子句,但我没有在任何地方使用 go。 Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please?关于如何有效地使用 RANK () OVER (PARTITION BY) 子句来获得这个 output 的任何提示,好吗? Or do I need to use COUNT () OVER?还是我需要使用 COUNT () OVER?

The analysis exception says:分析异常说:

`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358 

but then it is quite a long error.但这是一个相当长的错误。

Update:更新:

So I tried this code:所以我尝试了这段代码:

SELECT DISTINCT year,
          month,
          week_day,
          COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count 
          from ticket_sales
           order by year, month, weekday_count DESC

And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows.这样做是给出每个月的所有工作日的结果,所以 output 是 12*7 而不是 12 行。 Still ways to learn around this but at least I am somewhere.仍然可以学习解决这个问题,但至少我在某个地方。

Try this query and let me know if return the desire result:试试这个查询,如果返回期望结果,请告诉我:

I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.我不确定字段名称是 number_tickets 还是 total_tickets,我使用了 number_tickets。

First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.首先,我将年、月和周的票数相加,然后返回每年和月的一行以及售出更多票的星期几。

WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
      EXTRACT(MONTH FROM date_time_sale) AS MONTH,
      week_day,
      SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)

SELECT DISTINCT 
      YEAR,
      MONTH,
      FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
      FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;

In Postgresql database I got the desire result.在 Postgresql 数据库中,我得到了期望的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM