[英]How to produce a year over year calculated column in R
First, is the data and then the manipulations.首先是数据,然后是操作。 Finally, is the current method that I am using and as of yet is producing no data.最后,是我正在使用的当前方法,到目前为止还没有产生任何数据。 The manipulations are to create a date and then create a rolling 12-Month average.操作是创建一个日期,然后创建一个滚动的 12 个月平均值。
Monthavg<-
c(20185,20186,20187,20188,20189,201810,201811,201812,20191,20192,20193,20194,20195,20196,
20197,20198,20199,201910,201911,201912,20201
,20202,20203,20204,20205,20206,20207
,20208,20209,202010,202011)
empavg<-c(2,4,6,7,8,10,12,14,16,18,20,22,24,26,28,30,32,36,36,38,40,42,44,46,48,48,50,52,52,54,56)
ces12f <- data.frame(Monthavg,empavg)
Manipulations操作
ces12f<- ces12f %>% mutate(year = substr(as.character(Monthavg),1,4),
month = substr(as.character(Monthavg),5,7),
date = as.Date(paste(year,month,"1",sep ="-")))
Month_ord <- order(Monthavg)
span_month=12
ces12f<-ces12f %>% mutate(ravg = zoo::rollmeanr(empavg, 12, fill = NA))
Annual difference attempt年度差异尝试
ces12f<- ces12f%>%
group_by(Monthavg)%>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)%>%
ungroup()
The end goal would be that 202011 minus 201911 or 47.5 minus 25.17 or 22.3.最终目标是 202011 减去 201911 或 47.5 减去 25.17 或 22.3。 The method that I use above produces nothing but NA's.我上面使用的方法只产生 NA。 Any insights as to how I can modify my existing code or simply use an entirely different method would be greatly appreciated.任何关于如何修改现有代码或简单地使用完全不同的方法的见解将不胜感激。
I tend to be a little more paranoid.我倾向于多一点偏执。 That is, if there is even a slight chance that we are missing one month of however many years we have, than doing a lag(..., 12)
is a bad idea, even worse because you will get no warnings or errors, and your data will be wrong.也就是说,如果我们错过了我们所拥有的多年的一个月的机会,那么做一个lag(..., 12)
是一个坏主意,更糟糕的是因为你不会收到任何警告或错误,并且您的数据将是错误的。
As such, I'm going to recommend a self-join.因此,我将推荐一个自我加入。
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month"))
# Monthavg empavg year month date ravg lastravg
# 1 20185 2 2018 5 2018-05-01 NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667
You can verify that each lastempavg
is the previous year's value, and you can mutate
the difference normally, perhaps您可以验证每个lastempavg
是上一年的值,并且您可以正常地mutate
差异,也许
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month")) %>%
mutate(AnnualDifference = ravg - lastravg)
# Monthavg empavg year month date ravg lastravg AnnualDifference
# 1 20185 2 2018 5 2018-05-01 NA NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333 23.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667 23.75000
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000 23.75000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333 23.75000
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000 23.66667
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000 23.33333
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667 22.66667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667 22.33333
Side note on this: it might be better to keep the year and month stored as integer
, for a few reasons: (1) it makes this kind of thing quite easy;旁注:最好将年份和月份存储为integer
,原因如下:(1)它使这种事情变得很容易; (2) it preserves ordinality, whereas arrange(ces12f, month)
will happily order the months as 1, 10, 11, 12, 2, etc; (2) 它保持序数,而arrange(ces12f, month)
将愉快地将月份排序为1、10、11、12、2等; (3) (subjective) they really are integers, after all. (3)(主观)毕竟它们确实是整数。
Here's an approach with tidyr::extract
.这是tidyr::extract
的一种方法。 You can use tidyr::complete
to ensure any missing months are filled in:您可以使用tidyr::complete
来确保填写任何缺失的月份:
library(tidyverse)
library(zoo)
ces12f %>%
mutate(Monthavg = as.character(Monthavg)) %>%
extract(Monthavg, into = c("Year", "Month"),
regex = "^([0-9]{4})([0-9]{1,2})$") %>%
mutate(across(Year:Month, as.integer)) %>%
arrange(Year,Month) %>%
complete(Year, Month) %>%
mutate(ravg = zoo::rollmeanr(empavg,12,NA)) %>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)
Year Month empavg ravg PreviousYear PreviousMonth AnnualDifference
1 2018 1 NA NA NA NA NA
2 2018 2 NA NA NA NA NA
3 2018 3 NA NA NA NA NA
4 2018 4 NA NA NA NA NA
5 2018 5 2 NA NA NA NA
6 2018 6 4 NA NA NA NA
7 2018 7 6 NA NA NA NA
8 2018 8 7 NA NA NA NA
9 2018 9 8 NA NA NA NA
10 2018 10 10 NA NA NA NA
11 2018 11 12 NA NA NA NA
12 2018 12 14 NA NA NA NA
13 2019 1 16 NA NA NA NA
14 2019 2 18 NA NA NA NA
15 2019 3 20 NA NA NA NA
16 2019 4 22 11.58333 NA NA NA
17 2019 5 24 13.41667 NA 11.58333 NA
18 2019 6 26 15.25000 NA 13.41667 NA
19 2019 7 28 17.08333 NA 15.25000 NA
20 2019 8 30 19.00000 NA 17.08333 NA
21 2019 9 32 21.00000 NA 19.00000 NA
22 2019 10 36 23.16667 NA 21.00000 NA
23 2019 11 36 25.16667 NA 23.16667 NA
24 2019 12 38 27.16667 NA 25.16667 NA
25 2020 1 40 29.16667 NA 27.16667 NA
26 2020 2 42 31.16667 NA 29.16667 NA
27 2020 3 44 33.16667 NA 31.16667 NA
28 2020 4 46 35.16667 11.58333 33.16667 23.58333
29 2020 5 48 37.16667 13.41667 35.16667 23.75000
30 2020 6 48 39.00000 15.25000 37.16667 23.75000
31 2020 7 50 40.83333 17.08333 39.00000 23.75000
32 2020 8 52 42.66667 19.00000 40.83333 23.66667
33 2020 9 52 44.33333 21.00000 42.66667 23.33333
34 2020 10 54 45.83333 23.16667 44.33333 22.66667
35 2020 11 56 47.50000 25.16667 45.83333 22.33333
36 2020 12 NA NA 27.16667 47.50000 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.