简体   繁体   English

如何按月份和年份对数据进行分组

[英]How to group data by the month and year

Before I get into the issue, here's a 2 second background: I've been working on this RFM analysis, and thanks to our peers, was finally able to output an RFM score for each customer_id in my data set, along with each of their individual R, F, and M scores.在我进入这个问题之前,这里有一个 2 秒的背景:我一直在研究这个 RFM 分析,感谢我们的同行,终于能够为我的数据集中的每个 customer_id 输出一个 RFM 分数,以及他们的每个个人 R、F 和 M 分数。 Here it is, if you're curious or would like to use it for yourself:在这里,如果您很好奇或想自己使用它:

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY customer_id))

ORDER BY customer_id desc)

Here's an image: enter image description here这是一张图片:在此处输入图片说明

Now, my issue is that I need to keep my output in this kind of format, but to group the data by the Month and Year as well.现在,我的问题是我需要保持这种格式的输出,但也要按月份和年份对数据进行分组。 I initially had grouped this data by customer_id, because I want the RFM and the individual scores to only appear by unique customer_id, but now I need it by the Month+Year and the customer_id (ie first column would be Jan 2018, then list all the unique customer_id rows for that month/year combo. Then Feb 2018, and so on).我最初按 customer_id 对这些数据进行了分组,因为我希望 RFM 和个人分数仅按唯一的 customer_id 显示,但现在我需要按月 + 年和 customer_id(即第一列是 2018 年 1 月,然后列出所有该月/年组合的唯一 customer_id 行。然后是 2018 年 2 月,依此类推)。 Anyone have any suggestions?有人有什么建议吗?

Thank you very much and let me know if you have any questions!!非常感谢,如果您有任何问题,请告诉我!!

Best, Z最好的,Z

If you want to group by year-month and customer_id , in that order, change your GROUP BY :如果您想按year-monthcustomer_id分组,请GROUP BY顺序更改您的GROUP BY

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    YearMonth,
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    YearMonth,
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    to_char(oms_order_date, 'YYYY-MM') AS YearMonth,
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
ORDER BY YearMonth, customer_id desc)

As requested by Antonio:根据安东尼奥的要求:

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    to_char(oms_order_date, 'YYYY-MM'),
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))

ORDER BY customer_id desc)

LIMIT 100

Error is stating: "42703: column "oms_order_date" does not exist in derived_table2"错误说明:“42703:derive_table2 中不存在列“oms_order_date”

I know for a fact this is a column in this table.我知道事实上这是该表中的一列。 Confirmed using: SELECT oms_order_date FROM l_dmw_order_report确认使用:SELECT oms_order_date FROM l_dmw_order_report

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM