简体   繁体   中英

calculate moving average across columns in R

I have a time series data and for each 3 years I want a moving average value. I have seen the TTR and SMA queries, but they all indicate that the rolling mean or the moving average operation is performed on a single column row and based on that row means a new column is created with number of NA s on top depending on k value.

I want the dataframe to be produced just as the original data with moving average. Since my window will be 3 that means centre column and 2 adjacent columns will be used. We can dump the first and the last column just in case as there will no adjacent columns for them.

The hyopthetical data is given below:

   1961 1962 1963 1964 1965 1966 1967
1    9   13    8    4   15    1   19
2   14    2   10    6   15    7   17
3   16    7    1   18    3    9    6

As some elaboration is sought here is my idea.

for 1962 <- c(9+13+8/3, 14+2+10/3, 16+7+1/3) and so on for successive columns. The first and the last columns can have NAs .

This type of problems generaly has to do with reshaping the data. In order to compute the rolloing means, the format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
Then compute the means with function rollmean , package zoo .
And finally reshape back to wide format.

library(dplyr)
library(tidyr)

jj1 %>%
  mutate(id = row_number()) %>%
  pivot_longer(
    cols = -id,
    names_to = 'year',
    values_to = 'value'
  ) %>%
  arrange(id, year) %>%
  group_by(id) %>%
  mutate(value = zoo::rollmean(value, k = 3, fill = NA)) %>%
  pivot_wider(
    id_cols = id,
    names_from = year,
    values_from = value
  ) %>%
  ungroup() %>%
  select(-id)

You can do this by simply using loop in R:

#generating some dummy data
datad <- matrix(rnorm(100), ncol = 10)
colnames(datad) <- 2001:2010

ma <- list() #moving average
for(i in 2:(ncol(datad)-1)) {
  ma[[i-1]] <- apply(datad[, (i-1):(i+1)], 1, mean)
}

#convert back to matrix
ma <- Reduce(cbind, ma)
#getting original column name
colnames(ma) <- colnames(datad)[2:(ncol(datad)-1)]

Assuming that the question intended c((9+13+8)/3, (14+2+10)/3, (16+7+1)/3) as the values for 1962 and not the values shown there, rollmean can be used in either of the following manners. These one-liners give matrices as the result but as.data.frame can be used on the result if it is important that it be a data frame.

library(zoo)

t(apply(DF, 1, rollmean, 3))
##      1962   1963    1964    1965   1966
## 1 10.0000 8.3333  9.0000  6.6667 11.667
## 2  8.6667 6.0000 10.3333  9.3333 13.000
## 3  8.0000 8.6667  7.3333 10.0000  6.000

t(rollmean(t(DF), 3))
##      [,1]   [,2]    [,3]    [,4]   [,5]
## 1 10.0000 8.3333  9.0000  6.6667 11.667
## 2  8.6667 6.0000 10.3333  9.3333 13.000
## 3  8.0000 8.6667  7.3333 10.0000  6.000

Note

The input in reproducible form:

Lines <- "
   1961 1962 1963 1964 1965 1966 1967
1    9   13    8    4   15    1   19
2   14    2   10    6   15    7   17
3   16    7    1   18    3    9    6"
DF <- read.table(text = Lines, check.names = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM