简体   繁体   中英

Grouping data by date range R

My dataframe has data in the following format from 1983-2008:

Year, temp,
1983, .109,
1984, .091,
1985, -.10,
1986, .051,
1987, -.071,
1988, .101,
1989, .003,
1990, -.051,
1991, -.110,
1992, .134,
1993, .091,
1994, .122,
1995, .101,
1996, .087,
1997, .075,

Is there a way to plot this data in a scatter plot so that each plot is the average value from a 5 year frame. For example, the first plot's x value would be 1983-1987 and the y value would be the average temp from those years. I have looked into the aggregate function with mean as the 3rd parameter but a date range is not supported in this function.

We can use zoo 's rollmean function to calculate rolling average and ggplot2 to plot.

library(dplyr)
library(ggplot2)

df %>%
  mutate(temp_rolling_avg = zoo::rollmean(temp, 5, align = 'left', fill = NA), 
         Year_label = paste(Year, lead(Year, 4), sep = '-')) %>%
  filter(!is.na(temp_rolling_avg)) %>%
  ggplot(aes(Year_label, temp_rolling_avg)) + 
  geom_col()

在此处输入图像描述

You can use embed and rowMeans :

x = 1985 : 1995
y = rowMeans(embed(df$temp,5))
plot(x, y)

Use a rolling mean function. This is a base R solution:

vec is your object, len is the window size and prtl prints a partial mean.

rollmean <- function(vec, len, prtl = FALSE) {
  if (len > length(vec)) {
     stop(paste("Choose lower range,", len, ">", length(vec)))
   }
   else {
     if (prtl == T) {
       sapply(1:length(vec), function(i) {
         if (i <= len) {
           mean(vec[1:i])
         }
         else {
           mean(vec[(i - (len - 1)):i])
         }
       })
     }
     else {
       sapply(1:length(vec), function(i) {
         if (i - (len - 1) > 0) {
           mean(vec[(i - (len - 1)):i])
         }
         else {
           NA
         }
       })
     }
   }
 }              

To get the data, use it like this:

dat
   Year   temp
1  1983  0.109
2  1984  0.091
3  1985 -0.100
...

mydat <- setNames( data.frame( paste( rollmean(dat$Year,5) - 2, 
           rollmean(dat$Year,5) + 2, sep="-" ),
           rollmean(dat$temp,5) ), colnames(dat) )

mydat
        Year    temp
1      NA-NA      NA
2      NA-NA      NA
3      NA-NA      NA
4      NA-NA      NA
5  1983-1987  0.0160
6  1984-1988  0.0144
7  1985-1989 -0.0032
8  1986-1990  0.0066
9  1987-1991 -0.0256
10 1988-1992  0.0154
11 1989-1993  0.0134
12 1990-1994  0.0372
13 1991-1995  0.0676
14 1992-1996  0.1070
15 1993-1997  0.0952

Plotting the data, eg as a barplot (use geom_point( aes( Year, temp )) for a scatter plot):

require(ggplot2)
ggplot( mydat ) + geom_bar( aes( Year, temp, fill=Year ), stat="identity" ) +
    theme(axis.text = element_text(size = 6))

在此处输入图像描述

Omit the NAs simply by using mydat[.is,na(mydat[,2]),]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM