[英]Selecting max date of each month
I have a table with a lot of cumulative columns, these columns reset to 0 at the end of each month.我有一个包含很多累积列的表,这些列在每个月末重置为 0。 If I
sum
this data, I'll end up double counting.如果我
sum
这些数据,我最终会重复计算。 Instead, With Hive, I'm trying to select the max date of each month.相反,使用 Hive,我试图选择每个月的最大日期。
I've tried this:我试过这个:
SELECT
yyyy_mm_dd,
id,
name,
cumulative_metric1,
cumulative_metric2
FROM
mytable
WHERE
yyyy_mm_dd = last_day(yyyy_mm_dd)
mytable
has daily data from the start of the year. mytable
有从年初开始的每日数据。 In the output of the above, I only see the last date for January but not February.在上面的输出中,我只看到一月的最后日期,而不是二月。 How can I select the last day of each month?
如何选择每个月的最后一天?
February is not over yet.二月还没有结束。 Perhaps a window function does what you want:
也许窗口函数可以满足您的需求:
SELECT yyyy_mm_dd, id, name, cumulative_metric1, cumulative_metric2
FROM (SELECT t.*,
MAX(yyyy_mm_dd) OVER (PARTITION BY last_day(yyyy_mm_dd)) as last_yyyy_mm_dd
FROM mytable t
) t
WHERE yyyy_mm_dd = last_yyyy_mm_dd;
This calculates the last day in the data .这将计算数据中的最后一天。
use correlated subquery and date to month function in hive在 hive 中使用相关子查询和日期到月份函数
SELECT
yyyy_mm_dd,
id,
name,
cumulative_metric1,
cumulative_metric2
FROM
mytable t1
WHERE
yyyy_mm_dd = select max(yyyy_mm_dd) from mytable t2 where
month(t1.yyyy_mm_dd)= month(t2.yyyy_mm_dd)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.