简体   繁体   中英

Shifting of running mean output from the rollapply function in R

I am trying to plot the a time series with its corresponding 9 year running mean. I am using the rollapply function from the "zoo" package.

I don't know why the "running mean" time series is not aligned properly even though I change the "align" in the function.

The time series is from 1969 to 2009

Here's the data that I am using:

structure(list(Year = 1961:2009, Rain = c(7.6656130268, 8.1981182796, 
14.4514275121, 13.1530337942, 9.2569892473, 14.1592933948, 10.8212829069, 
3.2401689708, 14.5850998464, 9.614093702, 13.1677048572, 4.7452764977, 
20.7346774194, 9.3896697389, 21.9528735632, 22.5482334869, 6.0696620584, 
7.100640041, 4.706154987, 7.9103302611, 9.9548387097, 8.0649001536, 
6.2932888395, 3.8337173579, 23.5, 2.4107142857, 14.7172784575, 
9.7700076805, 7.6785330261, 7.5453917051, 8.8073044123, 7.7576420891, 
17.0896697389, 10.2380952381, 19.1981460882, 7.0900537634, 5.0630184332, 
22.1928955453, 17.3850945495, 14.71593702, 12.7344086022, 6.0408602151, 
8.0338524286, 7.1766513057, 21.8706989247, 10.6695852535, 21.4467185762, 
10.5718894009, 3.9693548387)), .Names = c("Year", "Rain"), class = 
"data.frame", row.names = c(NA, 
-49L))

Here's my script:

dat<- read.csv("test.csv",header=TRUE,sep=",")
dat[dat == -999]<- NA
dat[dat == -888]<- 0
dat<-data.frame(dat)

dat$mav <- rollapply(dat$Rain,width=9,mean,fill=NA,align="right")


p <- ggplot(dat, aes(x = Year))
p <- p + geom_line(aes(y = Rain,color="test"))
p <- p + geom_point(aes(y = Rain,color="test"),size=1)
p <- p + geom_line(aes(y=mav, color = "9-year running mean") , lwd = 1)
p <- p + theme(panel.background=element_rect(fill="white"),
         plot.margin = unit(c(0.5,0.5,0.5,0.5),"cm"),
         panel.border=element_rect(colour="black",fill=NA,size=1),
         axis.line.x=element_line(colour="black"),
         axis.line.y=element_line(colour="black"),
         axis.text=element_text(size=20,colour="black",family="serif"),
         axis.title=element_text(size=15,colour="black",family="serif"),
         legend.position = "top")
p <- p + scale_colour_manual(name="",values=c("test"="steelblue4","9-year running mean"="green"))
p <- p + scale_y_continuous(breaks=seq(0,50, by=10),limits=c(0,50), expand=c(0,0))
p <- p + scale_x_discrete(limits = c(seq(1961,2009,9)),expand=c(0,0))
p <- p + labs(x="Year",y="Rainfall(mm/day)")

Here's the output image: 输出图像

What I am expecting:

[a] The time series of the running average should start at 1969 and the last value should be at 2000. But in the output image, the time series is shifted to the right and ends at 2009.

[b] When I set the 'align' to "center", the running mean starts at 1965.

[c] Any suggestion on how to do this correctly in R?

I think you might be misunderstanding how the width, fill, and alignment works in an rolling apply.

vec <- 1:10
rollapply(vec, 5, mean, fill=NA, align='right')
#  [1] NA NA NA NA  3  4  5  6  7  8

It is first taking the n=5 values and calculating the mean:

mean(vec[1:5])
# [1] 3

Where to put it? Since we said align='right' , it places it in the right-most spot, so index 5.

#  [1]  1  2  3  4  5  6  7  8  9 10
#                   ^
#                   3

and since you said fill=NA , it keeps the preceding spaces and populates them with NA

#  [1]  1  2  3  4  5  6  7  8  9 10
#       ^  ^  ^  ^
#  [1] NA NA NA NA  3

For the next iteration, it takes the mean of the 2nd through 6th position:

mean(vec[2:6])
# [1] 4

which it then places in the 6th position:

#  [1]  1  2  3  4  5  6  7  8  9 10
#                      ^
#  [1] NA NA NA NA  3  4

When we get to the last iteration, we are calculating positions len-n+1 (10-5+1=6) through len (10), so

mean(vec[6:10])
# [1] 8

so it is put in the last position

#  [1]  1  2  3  4  5  6  7  8  9 10
#                                  ^
#  [1] NA NA NA NA  3  4  5  6  7  8

So, because we had width=5 and fill=NA , we will have 5-1=4 spaces filled with NA . (There might be more if there were any more NA s in the data.) Had we chosen instead width=5 without fill , then we would have had 5-1=4 spaces missing , meaning

# [1] 3 4 5 6 7 8

Had we done width=5, fill=NA, align='left' , then we should see:

rollapply(vec, 5, mean, fill=NA, align='left')
#  [1]  3  4  5  6  7  8 NA NA NA NA

because we asked for NA s vice removal, and we said to put each value in the left-most for each window of width 5. The last iteration ( mean(vec[6:10]) with a value of 8) was put in the left-most position of the last window of width 5, meaning there are four spaces to the right with known unknown values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM