如何以不同的时间间隔求和以找到多年高峰

Question

I am trying to find historical consecutive multi year sales peak of items. 我正在尝试查找历史上连续多年的商品销售高峰。 My problem is that some items were sold in the past and discontinued, but still need to be part of the analysis. 我的问题是，某些商品在过去已经售出并停产，但仍需要作为分析的一部分。 For example: 例如：

I've worked through some for loops in r, however I am unsure how to tackle the issue of summing up the multiple consecutive years and also comparing it against other local maxima within the same dataset. 我已经研究过r中的一些for循环，但是我不确定如何解决连续多年的总和并将其与同一数据集中的其他局部最大值进行比较的问题。

Year      Item            Sales
2001      Trash Can       100
2002      Trash Can       125
2003      Trash Can       90
2004      Trash Can       97
2002      Red Balloon     23
2003      Red Balloon     309
2004      Red Balloon     67
2005      Red Balloon     8
1998      Blue Bottle     600
1999      Blue Bottle     565

Based on the above data, if I wanted to calculate the 2 year peak of sales, I would want to output Blue Bottle 1165(sum of 1998 and 1999), Red Balloon 376(sum of 2003 and 2004) and Trash Can 225(sum of 2001 and 2002). 根据以上数据，如果我想计算两年的销售高峰，我想输出Blue Bottle 1165（1998和1999年的总和），Red Balloon 376（2003和2004年的总和）和Trash Can 225（总和） 2001年和2002年）。 However, if I wanted a 3 year peak, Blue bottle would be ineligible because it only has 2 years of data. 但是，如果我想要一个3年的峰值，那么Blue瓶将是不合格的，因为它只有2年的数据。

If I wanted to calculate the 3 year peak of sales, I would want to output Red Balloon 399(sum of 2002 to 2004) and Trash Can 315(sum of 2001 to 2003). 如果要计算3年的销售高峰，我想输出Red Balloon 399（2002年至2004年的总和）和Trash Can 315（2001年至2003年的总和）。

Answer 1

In SQL, you can use window functions. 在SQL中，可以使用窗口函数。 For eligible 2 year sales: 对于两年的合格销售：

    select item, sales, year
    from (select t.*,
                 sum(sales) over (partition by item order by year rows between 1 preceding and current row) as two_year_sales,
                 row_number() over (partition by item order by year) as seqnum
          from t
         ) t
    where seqnum >= 2;

And to get the peak: 并达到顶峰：

select t.*   
from (select item, two_year_sales, year,
             max(two_year_sales) over (partition by item) as max_two_year_sales
      from (select t.*,
                   sum(sales) over (partition by item order by year rows between 1 preceding and current row) as two_year_sales,
                   row_number() over (partition by item order by year) as seqnum
            from t
           ) t
      where seqnum >= 2
     ) t
where two_year_sales = max_two_year_sales;

Answer 2

A solution in R using the tidyverse and RcppRoll : 使用tidyverse和RcppRoll R解决方案：

#Loading the packages and your data as a `tibble`
library("RcppRoll")
library("dplyr")

tbl <- tribble(
  ~Year,     ~Item,          ~Sales,
  2001,      "Trash Can",       100,
  2002,      "Trash Can",       125,
  2003,      "Trash Can",       90,
  2004,      "Trash Can",       97,
  2002,      "Red Balloon",     23,
  2003,      "Red Balloon",     309,
  2004,      "Red Balloon",      67,
  2005,      "Red Balloon",     8,
  1998,      "Blue Bottle",     600,
  1999,      "Blue Bottle",     565
)

# Set the number of consecutive years
n <- 2

# Compute the rolling sums (assumes data to be sorted) and take max
res <- tbl %>% 
  group_by(Item) %>% 
  mutate(rollingsum = roll_sumr(Sales, n)) %>% 
  summarize(best_sum = max(rollingsum, na.rm = TRUE))
print(res)
## A tibble: 3 x 2
#  Item        best_sum
#  <chr>          <dbl>
#1 Blue Bottle     1165
#2 Red Balloon      376
#3 Trash Can        225

Setting n <- 3 yields a different res : 设置n <- 3产生不同的res ：

print(res)
## A tibble: 3 x 2
#  Item        best_sum
#  <chr>          <dbl>
#1 Blue Bottle     -Inf
#2 Red Balloon      399
#3 Trash Can        315

Answer 3

I only can help you with the SQL part; 我只能为您提供SQL部分的帮助； Use GROUP BY with HAVING . 将GROUP BY与HAVING 。 With HAVIG it will be filtered out all items without an specified minimum number of historical data-years. 使用HAVIG ，它将过滤掉所有没有指定最小历史数据年数的项目。

Check if this query adjusts your requirements. 检查此查询是否调整您的要求。

SELECT 
     item
     , count(*) as num_years
     , sum(Sales) as local_max 
from [your_table] 
where year between [year_ini] and [year_end]
group by item 
having count(*) >= [number_of_years]

Answer 4

Read the data dat (shown reproducibly in the Note at the end) into a zoo series with one column per Item and then convert to a ts series tt (which will fill in the missing years with NA). 将数据dat （末尾的注释中可重复显示）读入一个动物园系列，每个Item一栏，然后转换为ts系列tt （将用NA填写缺失的年份）。 Then use rollsumr to take the sums of every consecutive k years for each Item , find the maximum value for each Item , stack that into a data frame and omit any NA rows. 然后使用rollsumr采取每个连续的总和k年针对每个Item ，找到每个最大值Item ，该堆叠成一个数据帧，并省略任何NA行。 The function Max is like max(x, na.rm = TRUE) except that if x is all NAs it returns NA instead of -Inf and does not issue a warning. 函数Max类似于max(x, na.rm = TRUE)不同之处在于如果x是所有NA，它将返回NA而不是-Inf并且不会发出警告。 stack outputs the item column second so reverse the columns using 2:1 and add nicer names. stack第二个输出项目列，因此使用2：1反转列并添加更好的名称。

library(zoo)

Max <- function(x) if (all(is.na(x))) NA else max(x, na.rm = TRUE)

peak <- function(data, k) {
  tt <- as.ts(read.zoo(data, split = "Item"))
  s <- na.omit(stack(apply(rollsumr(tt, k), 2, Max)))
  setNames(s[2:1], c("Item", "Sum"))
}

peak(dat, 2)
##          Item  Sum
## 1 Blue Bottle 1165
## 2 Red Balloon  376
## 3   Trash Can  225

peak(dat, 3)
##          Item Sum
## 2 Red Balloon 399
## 3   Trash Can 315

Note 注意

The input in reproducible form is assumed to be: 可复制形式的输入假定为：

dat <- 
structure(list(Year = c(2001L, 2002L, 2003L, 2004L, 2002L, 2003L, 
2004L, 2005L, 1998L, 1999L), Item = c("Trash Can", "Trash Can", 
"Trash Can", "Trash Can", "Red Balloon", "Red Balloon", "Red Balloon", 
"Red Balloon", "Blue Bottle", "Blue Bottle"), Sales = c(100L, 
125L, 90L, 97L, 23L, 309L, 67L, 8L, 600L, 565L)), row.names = c(NA, 
-10L), class = "data.frame")

如何以不同的时间间隔求和以找到多年高峰

问题描述

4 个解决方案

解决方案1
0 已采纳 2019-02-17 20:05:25

解决方案2
0 2019-02-17 20:10:10

解决方案3
0 2019-02-17 20:11:08

解决方案4
0 2019-02-17 20:35:24

Note 注意

如何以不同的时间间隔求和以找到多年高峰

问题描述

4 个解决方案

解决方案1 0 已采纳 2019-02-17 20:05:25

解决方案2 0 2019-02-17 20:10:10

解决方案3 0 2019-02-17 20:11:08

解决方案4 0 2019-02-17 20:35:24

Note 注意

解决方案1
0 已采纳 2019-02-17 20:05:25

解决方案2
0 2019-02-17 20:10:10

解决方案3
0 2019-02-17 20:11:08

解决方案4
0 2019-02-17 20:35:24