简体   繁体   中英

How to calculate mean for previous 7 days with in same time

Hi I have data frame as below

In the below df how can we replace/find NA's in "Output" column which gives average for last 7 days with same time. Eg: If value for 2014-02-08 00:45 having NA then we need to replace with previous 7 average value ie mean of values in from (feb 1 to feb 7) with same time(00:45)

dates = c('21-01-2014 00:15', '21-01-2014 00:30','21-01-2014 00:45','22-01-2014 00:00','22-01-2014 00:30','22-01-2014 00:45','23-01-2014 00:00','23-01-2014 00:15','23-01-2014 00:45','25-01-2014 00:45','26-01-2014 00:45','26-01-2014 00:46','26-01-2014 00:30','27-02-2014 00:45','28-02-2014 00:45','29-03-2014 00:45','30-03-2014 00:00','30-03-2014 00:45','30-03-2014 00:45','31-03-2014 00:45','01-04-2014 00:45','02-04-2014 00:45','03-04-2014 00:45')
value = c(20,   5,  10, 23, NA, 22, 12, 10, NA, 12, NA, 4,  19, 12, 
          NA,   NA, 2,  2,  NA, 14, NA, 21, NA)
output =c(20,   5,  10, 23, 5,  22, 12, 10, 10, 12, 11, 4,  19, 12,
          14,   14, 2,  2,  11.6,   14, 12, 21, 13.28)

df=data.frame(dates, value,output)

    df$dates = as.POSIXct(strptime(df$dates, format = "%d-%m-%Y %H:%M","GMT"))

Thanks in advance..

You can loop through the rows.

library(data.table)
library(dplyr)
df <- df %>% as.data.table()
for(index in 1:nrow(df)){ # index <- 23
  print(index)
  if(df[index, value] %>% is.na()){
    if(index >= 7){
    df[index, value := df[(index - 7):(index-1), value] %>% mean()] 
    }else
    {
      df[index, value:=df[1:index-1, value] %>% mean()] 
    }
  }
}

I used data.table because I am more familiar with that. I guess you can continue with data.frames if you want after the processing.

tell me if this is whatyou want

I would try to join the dataframe with itself on the conditions that two rows match if they are part of the group of rows that you want to find the average of.

library(data.table)
dt <- data.table(df)
dt[ , c("id", "dates_tmp1", "dates_tmp2", "dates_7", "time")
 := list(1:nrow(dt), dates, dates, dates - as.difftime(7, unit="days"), strftime(dates, format="%H:%M:%S"))]

Created some temporary columns for the join to not destroy the old data.

joined <- dt[dt, on=.(dates_tmp1>=dates_tmp1, dates_7<=dates_tmp2, time==time), allow=TRUE]
mean_values <- joined[ , list(mean_value=mean(i.value, na.rm = TRUE)), by = "id"]
mean_values <- mean_values[order(id)]
    id mean_value
 1:  1   20.00000
 2:  2    5.00000
 3:  3   10.00000
 4:  4   23.00000
 5:  5    5.00000
 6:  6   16.00000

Take these values to replace the NA ones.

If you want the last 7 days that occur in then you can create a new column that enumerates the days and then do the same.

dt[ , c("id",  "time"):= list(1:nrow(dt),strftime(dates, format="%H:%M:%S"))]
dt[ , days := as.numeric(frank(as.Date(dates), ties.method = "dense")), by = time]
dt[ , days_7:=days - 7]
joined <- dt[dt, on=.(days>=days, days_7<=days, time==time), allow=TRUE]
mean_values <- joined[ , list(mean_value=mean(i.value, na.rm = TRUE)), by = "id"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM