简体   繁体   English

如何在 R 中生成同比计算列

[英]How to produce a year over year calculated column in R

First, is the data and then the manipulations.首先是数据,然后是操作。 Finally, is the current method that I am using and as of yet is producing no data.最后,是我正在使用的当前方法,到目前为止还没有产生任何数据。 The manipulations are to create a date and then create a rolling 12-Month average.操作是创建一个日期,然后创建一个滚动的 12 个月平均值。

   Monthavg<- 
   c(20185,20186,20187,20188,20189,201810,201811,201812,20191,20192,20193,20194,20195,20196,
      20197,20198,20199,201910,201911,201912,20201
      ,20202,20203,20204,20205,20206,20207
      ,20208,20209,202010,202011)

  empavg<-c(2,4,6,7,8,10,12,14,16,18,20,22,24,26,28,30,32,36,36,38,40,42,44,46,48,48,50,52,52,54,56)

  ces12f <- data.frame(Monthavg,empavg)

Manipulations操作

 ces12f<- ces12f %>% mutate(year = substr(as.character(Monthavg),1,4),
              month = substr(as.character(Monthavg),5,7),
              date = as.Date(paste(year,month,"1",sep ="-")))
 Month_ord <- order(Monthavg)
 span_month=12
 ces12f<-ces12f %>% mutate(ravg = zoo::rollmeanr(empavg, 12, fill = NA))

Annual difference attempt年度差异尝试

 ces12f<- ces12f%>%
 group_by(Monthavg)%>%
 mutate(PreviousYear=lag(ravg,12), 
     PreviousMonth=lag(ravg),
     AnnualDifference=ravg-PreviousYear)%>%
 ungroup()

The end goal would be that 202011 minus 201911 or 47.5 minus 25.17 or 22.3.最终目标是 202011 减去 201911 或 47.5 减去 25.17 或 22.3。 The method that I use above produces nothing but NA's.我上面使用的方法只产生 NA。 Any insights as to how I can modify my existing code or simply use an entirely different method would be greatly appreciated.任何关于如何修改现有代码或简单地使用完全不同的方法的见解将不胜感激。

I tend to be a little more paranoid.我倾向于多一点偏执。 That is, if there is even a slight chance that we are missing one month of however many years we have, than doing a lag(..., 12) is a bad idea, even worse because you will get no warnings or errors, and your data will be wrong.也就是说,如果我们错过了我们所拥有的多年的一个月的机会,那么做一个lag(..., 12)是一个坏主意,更糟糕的是因为你不会收到任何警告或错误,并且您的数据将是错误的。

As such, I'm going to recommend a self-join.因此,我将推荐一个自我加入。

transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
  left_join(ces12f, ., by = c("year", "month"))
#    Monthavg empavg year month       date     ravg lastravg
# 1     20185      2 2018     5 2018-05-01       NA       NA
# 2     20186      4 2018     6 2018-06-01       NA       NA
# 3     20187      6 2018     7 2018-07-01       NA       NA
# 4     20188      7 2018     8 2018-08-01       NA       NA
# 5     20189      8 2018     9 2018-09-01       NA       NA
# 6    201810     10 2018    10 2018-10-01       NA       NA
# 7    201811     12 2018    11 2018-11-01       NA       NA
# 8    201812     14 2018    12 2018-12-01       NA       NA
# 9     20191     16 2019     1 2019-01-01       NA       NA
# 10    20192     18 2019     2 2019-02-01       NA       NA
# 11    20193     20 2019     3 2019-03-01       NA       NA
# 12    20194     22 2019     4 2019-04-01 11.58333       NA
# 13    20195     24 2019     5 2019-05-01 13.41667       NA
# 14    20196     26 2019     6 2019-06-01 15.25000       NA
# 15    20197     28 2019     7 2019-07-01 17.08333       NA
# 16    20198     30 2019     8 2019-08-01 19.00000       NA
# 17    20199     32 2019     9 2019-09-01 21.00000       NA
# 18   201910     36 2019    10 2019-10-01 23.16667       NA
# 19   201911     36 2019    11 2019-11-01 25.16667       NA
# 20   201912     38 2019    12 2019-12-01 27.16667       NA
# 21    20201     40 2020     1 2020-01-01 29.16667       NA
# 22    20202     42 2020     2 2020-02-01 31.16667       NA
# 23    20203     44 2020     3 2020-03-01 33.16667       NA
# 24    20204     46 2020     4 2020-04-01 35.16667 11.58333
# 25    20205     48 2020     5 2020-05-01 37.16667 13.41667
# 26    20206     48 2020     6 2020-06-01 39.00000 15.25000
# 27    20207     50 2020     7 2020-07-01 40.83333 17.08333
# 28    20208     52 2020     8 2020-08-01 42.66667 19.00000
# 29    20209     52 2020     9 2020-09-01 44.33333 21.00000
# 30   202010     54 2020    10 2020-10-01 45.83333 23.16667
# 31   202011     56 2020    11 2020-11-01 47.50000 25.16667

You can verify that each lastempavg is the previous year's value, and you can mutate the difference normally, perhaps您可以验证每个lastempavg是上一年的值,并且您可以正常地mutate差异,也许

transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
  left_join(ces12f, ., by = c("year", "month")) %>%
  mutate(AnnualDifference = ravg - lastravg)
#    Monthavg empavg year month       date     ravg lastravg AnnualDifference
# 1     20185      2 2018     5 2018-05-01       NA       NA               NA
# 2     20186      4 2018     6 2018-06-01       NA       NA               NA
# 3     20187      6 2018     7 2018-07-01       NA       NA               NA
# 4     20188      7 2018     8 2018-08-01       NA       NA               NA
# 5     20189      8 2018     9 2018-09-01       NA       NA               NA
# 6    201810     10 2018    10 2018-10-01       NA       NA               NA
# 7    201811     12 2018    11 2018-11-01       NA       NA               NA
# 8    201812     14 2018    12 2018-12-01       NA       NA               NA
# 9     20191     16 2019     1 2019-01-01       NA       NA               NA
# 10    20192     18 2019     2 2019-02-01       NA       NA               NA
# 11    20193     20 2019     3 2019-03-01       NA       NA               NA
# 12    20194     22 2019     4 2019-04-01 11.58333       NA               NA
# 13    20195     24 2019     5 2019-05-01 13.41667       NA               NA
# 14    20196     26 2019     6 2019-06-01 15.25000       NA               NA
# 15    20197     28 2019     7 2019-07-01 17.08333       NA               NA
# 16    20198     30 2019     8 2019-08-01 19.00000       NA               NA
# 17    20199     32 2019     9 2019-09-01 21.00000       NA               NA
# 18   201910     36 2019    10 2019-10-01 23.16667       NA               NA
# 19   201911     36 2019    11 2019-11-01 25.16667       NA               NA
# 20   201912     38 2019    12 2019-12-01 27.16667       NA               NA
# 21    20201     40 2020     1 2020-01-01 29.16667       NA               NA
# 22    20202     42 2020     2 2020-02-01 31.16667       NA               NA
# 23    20203     44 2020     3 2020-03-01 33.16667       NA               NA
# 24    20204     46 2020     4 2020-04-01 35.16667 11.58333         23.58333
# 25    20205     48 2020     5 2020-05-01 37.16667 13.41667         23.75000
# 26    20206     48 2020     6 2020-06-01 39.00000 15.25000         23.75000
# 27    20207     50 2020     7 2020-07-01 40.83333 17.08333         23.75000
# 28    20208     52 2020     8 2020-08-01 42.66667 19.00000         23.66667
# 29    20209     52 2020     9 2020-09-01 44.33333 21.00000         23.33333
# 30   202010     54 2020    10 2020-10-01 45.83333 23.16667         22.66667
# 31   202011     56 2020    11 2020-11-01 47.50000 25.16667         22.33333

Side note on this: it might be better to keep the year and month stored as integer , for a few reasons: (1) it makes this kind of thing quite easy;旁注:最好将年份和月份存储为integer ,原因如下:(1)它使这种事情变得很容易; (2) it preserves ordinality, whereas arrange(ces12f, month) will happily order the months as 1, 10, 11, 12, 2, etc; (2) 它保持序数,而arrange(ces12f, month)将愉快地将月份排序为1、10、11、12、2等; (3) (subjective) they really are integers, after all. (3)(主观)毕竟它们确实是整数。

Here's an approach with tidyr::extract .这是tidyr::extract的一种方法。 You can use tidyr::complete to ensure any missing months are filled in:您可以使用tidyr::complete来确保填写任何缺失的月份:

library(tidyverse)
library(zoo)
ces12f %>%
  mutate(Monthavg = as.character(Monthavg)) %>%
  extract(Monthavg, into = c("Year", "Month"),
          regex = "^([0-9]{4})([0-9]{1,2})$") %>%
  mutate(across(Year:Month, as.integer)) %>%
  arrange(Year,Month) %>%
  complete(Year, Month) %>%
  mutate(ravg = zoo::rollmeanr(empavg,12,NA)) %>%
  mutate(PreviousYear=lag(ravg,12), 
         PreviousMonth=lag(ravg),
         AnnualDifference=ravg-PreviousYear)
   Year Month empavg     ravg PreviousYear PreviousMonth AnnualDifference
1  2018     1     NA       NA           NA            NA               NA
2  2018     2     NA       NA           NA            NA               NA
3  2018     3     NA       NA           NA            NA               NA
4  2018     4     NA       NA           NA            NA               NA
5  2018     5      2       NA           NA            NA               NA
6  2018     6      4       NA           NA            NA               NA
7  2018     7      6       NA           NA            NA               NA
8  2018     8      7       NA           NA            NA               NA
9  2018     9      8       NA           NA            NA               NA
10 2018    10     10       NA           NA            NA               NA
11 2018    11     12       NA           NA            NA               NA
12 2018    12     14       NA           NA            NA               NA
13 2019     1     16       NA           NA            NA               NA
14 2019     2     18       NA           NA            NA               NA
15 2019     3     20       NA           NA            NA               NA
16 2019     4     22 11.58333           NA            NA               NA
17 2019     5     24 13.41667           NA      11.58333               NA
18 2019     6     26 15.25000           NA      13.41667               NA
19 2019     7     28 17.08333           NA      15.25000               NA
20 2019     8     30 19.00000           NA      17.08333               NA
21 2019     9     32 21.00000           NA      19.00000               NA
22 2019    10     36 23.16667           NA      21.00000               NA
23 2019    11     36 25.16667           NA      23.16667               NA
24 2019    12     38 27.16667           NA      25.16667               NA
25 2020     1     40 29.16667           NA      27.16667               NA
26 2020     2     42 31.16667           NA      29.16667               NA
27 2020     3     44 33.16667           NA      31.16667               NA
28 2020     4     46 35.16667     11.58333      33.16667         23.58333
29 2020     5     48 37.16667     13.41667      35.16667         23.75000
30 2020     6     48 39.00000     15.25000      37.16667         23.75000
31 2020     7     50 40.83333     17.08333      39.00000         23.75000
32 2020     8     52 42.66667     19.00000      40.83333         23.66667
33 2020     9     52 44.33333     21.00000      42.66667         23.33333
34 2020    10     54 45.83333     23.16667      44.33333         22.66667
35 2020    11     56 47.50000     25.16667      45.83333         22.33333
36 2020    12     NA       NA     27.16667      47.50000               NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM