简体   繁体   English

如何以不同的时间间隔求和以找到多年高峰

[英]How to sum on different intervals to find multi year peaks

I am trying to find historical consecutive multi year sales peak of items. 我正在尝试查找历史上连续多年的商品销售高峰。 My problem is that some items were sold in the past and discontinued, but still need to be part of the analysis. 我的问题是,某些商品在过去已经售出并停产,但仍需要作为分析的一部分。 For example: 例如:

I've worked through some for loops in r, however I am unsure how to tackle the issue of summing up the multiple consecutive years and also comparing it against other local maxima within the same dataset. 我已经研究过r中的一些for循环,但是我不确定如何解决连续多年的总和并将其与同一数据集中的其他局部最大值进行比较的问题。

Year      Item            Sales
2001      Trash Can       100
2002      Trash Can       125
2003      Trash Can       90
2004      Trash Can       97
2002      Red Balloon     23
2003      Red Balloon     309
2004      Red Balloon     67
2005      Red Balloon     8
1998      Blue Bottle     600
1999      Blue Bottle     565

Based on the above data, if I wanted to calculate the 2 year peak of sales, I would want to output Blue Bottle 1165(sum of 1998 and 1999), Red Balloon 376(sum of 2003 and 2004) and Trash Can 225(sum of 2001 and 2002). 根据以上数据,如果我想计算两年的销售高峰,我想输出Blue Bottle 1165(1998和1999年的总和),Red Balloon 376(2003和2004年的总和)和Trash Can 225(总和) 2001年和2002年)。 However, if I wanted a 3 year peak, Blue bottle would be ineligible because it only has 2 years of data. 但是,如果我想要一个3年的峰值,那么Blue瓶将是不合格的,因为它只有2年的数据。

If I wanted to calculate the 3 year peak of sales, I would want to output Red Balloon 399(sum of 2002 to 2004) and Trash Can 315(sum of 2001 to 2003). 如果要计算3年的销售高峰,我想输出Red Balloon 399(2002年至2004年的总和)和Trash Can 315(2001年至2003年的总和)。

In SQL, you can use window functions. 在SQL中,可以使用窗口函数。 For eligible 2 year sales: 对于两年的合格销售:

    select item, sales, year
    from (select t.*,
                 sum(sales) over (partition by item order by year rows between 1 preceding and current row) as two_year_sales,
                 row_number() over (partition by item order by year) as seqnum
          from t
         ) t
    where seqnum >= 2;

And to get the peak: 并达到顶峰:

select t.*   
from (select item, two_year_sales, year,
             max(two_year_sales) over (partition by item) as max_two_year_sales
      from (select t.*,
                   sum(sales) over (partition by item order by year rows between 1 preceding and current row) as two_year_sales,
                   row_number() over (partition by item order by year) as seqnum
            from t
           ) t
      where seqnum >= 2
     ) t
where two_year_sales = max_two_year_sales;

A solution in R using the tidyverse and RcppRoll : 使用tidyverseRcppRoll R解决方案:

#Loading the packages and your data as a `tibble`
library("RcppRoll")
library("dplyr")

tbl <- tribble(
  ~Year,     ~Item,          ~Sales,
  2001,      "Trash Can",       100,
  2002,      "Trash Can",       125,
  2003,      "Trash Can",       90,
  2004,      "Trash Can",       97,
  2002,      "Red Balloon",     23,
  2003,      "Red Balloon",     309,
  2004,      "Red Balloon",      67,
  2005,      "Red Balloon",     8,
  1998,      "Blue Bottle",     600,
  1999,      "Blue Bottle",     565
)

# Set the number of consecutive years
n <- 2

# Compute the rolling sums (assumes data to be sorted) and take max
res <- tbl %>% 
  group_by(Item) %>% 
  mutate(rollingsum = roll_sumr(Sales, n)) %>% 
  summarize(best_sum = max(rollingsum, na.rm = TRUE))
print(res)
## A tibble: 3 x 2
#  Item        best_sum
#  <chr>          <dbl>
#1 Blue Bottle     1165
#2 Red Balloon      376
#3 Trash Can        225

Setting n <- 3 yields a different res : 设置n <- 3产生不同的res

print(res)
## A tibble: 3 x 2
#  Item        best_sum
#  <chr>          <dbl>
#1 Blue Bottle     -Inf
#2 Red Balloon      399
#3 Trash Can        315

I only can help you with the SQL part; 我只能为您提供SQL部分的帮助; Use GROUP BY with HAVING . GROUP BYHAVING With HAVIG it will be filtered out all items without an specified minimum number of historical data-years. 使用HAVIG ,它将过滤掉所有没有指定最小历史数据年数的项目。

Check if this query adjusts your requirements. 检查此查询是否调整您的要求。

SELECT 
     item
     , count(*) as num_years
     , sum(Sales) as local_max 
from [your_table] 
where year between [year_ini] and [year_end]
group by item 
having count(*) >= [number_of_years]

Read the data dat (shown reproducibly in the Note at the end) into a zoo series with one column per Item and then convert to a ts series tt (which will fill in the missing years with NA). 将数据dat (末尾的注释中可重复显示)读入一个动物园系列,每个Item一栏,然后转换为ts系列tt (将用NA填写缺失的年份)。 Then use rollsumr to take the sums of every consecutive k years for each Item , find the maximum value for each Item , stack that into a data frame and omit any NA rows. 然后使用rollsumr采取每个连续的总和k年针对每个Item ,找到每个最大值Item ,该堆叠成一个数据帧,并省略任何NA行。 The function Max is like max(x, na.rm = TRUE) except that if x is all NAs it returns NA instead of -Inf and does not issue a warning. 函数Max类似于max(x, na.rm = TRUE)不同之处在于如果x是所有NA,它将返回NA而不是-Inf并且不会发出警告。 stack outputs the item column second so reverse the columns using 2:1 and add nicer names. stack第二个输出项目列,因此使用2:1反转列并添加更好的名称。

library(zoo)

Max <- function(x) if (all(is.na(x))) NA else max(x, na.rm = TRUE)

peak <- function(data, k) {
  tt <- as.ts(read.zoo(data, split = "Item"))
  s <- na.omit(stack(apply(rollsumr(tt, k), 2, Max)))
  setNames(s[2:1], c("Item", "Sum"))
}

peak(dat, 2)
##          Item  Sum
## 1 Blue Bottle 1165
## 2 Red Balloon  376
## 3   Trash Can  225

peak(dat, 3)
##          Item Sum
## 2 Red Balloon 399
## 3   Trash Can 315

Note 注意

The input in reproducible form is assumed to be: 可复制形式的输入假定为:

dat <- 
structure(list(Year = c(2001L, 2002L, 2003L, 2004L, 2002L, 2003L, 
2004L, 2005L, 1998L, 1999L), Item = c("Trash Can", "Trash Can", 
"Trash Can", "Trash Can", "Red Balloon", "Red Balloon", "Red Balloon", 
"Red Balloon", "Blue Bottle", "Blue Bottle"), Sales = c(100L, 
125L, 90L, 97L, 23L, 309L, 67L, 8L, 600L, 565L)), row.names = c(NA, 
-10L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 逐年求和不同的日期间隔 - Sum different date intervals year over year 如何在 sql 服务器中计算不同时间间隔的交易金额? - How to sum transaction amounts for different time intervals in sql server? 在 SQL 中按时间段内的值和总和查找数据的峰值 - Find Peaks Of Data By Values And Sum Over Time Period In SQL 查找数据峰值 - Find peaks of data 查找不同时间间隔的票号的循环时间 - Find the cycle time of a ticket number for different intervals MySQL 5.7 如何以天为间隔求和值? - How sum values in days intervals MySQL 5.7? 一年中以周为间隔的某列的总和,与日期一致的星期日期 - Output Sum of some column in week intervals throughout a year, week dates consistent with day 如何查找特定月份(与年份无关)中最畅销产品的总和 - How to find the total sum of the least selling product for a particular month (irrespective of the year) SQL - 如何为一年中的每个月计算某些服务(具有不同定价)的交易总和? - SQL - How can I do sum of transactions for certain services (that have different pricing) for each month of the year? 如何找到MySQL中不同组的列的总和的差异? - How to find the difference of aggregate sum of a column from different groups in MySQL?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM