简体   繁体   中英

For loop - select time window from day column

I need to adjust a code, which works perfectly with my dataframe (but with another set up), in order to select a 2 days time window from the column Day. In particular I am interested in the 1 day prior day0 (ie i - 1 and i, where i is the day of interest) and its (i - 1) values contained in the column Count have to be added into the day 0 (i) col Count.

Here an example of my dataframe:

df <- read.table(text = "
        Station   Day           Count
    1    33012  12448               4
    2    35004  12448               4
    3    35008  12448               4
    4    37006  12448               4
    5    21009   4835               3
    6    24005   4835               3
    7    27001   4835               3
    8    25005  12447               3
    9    29001  12447               3
    10   29002  12447               3
    11   29002  12446               3
    12   30001  12446               3
    13   31002  12446               3
    14   47007   4834               2
    15   49002   4834               2
    16   47004  12445               1
    17   51001  12449               1
    18   51003   4832               1
    19   52004   4836               1", header = TRUE)

my output should be:

           Station    Day           Count
        1    33012  12448               7
        2    35004  12448               7
        3    35008  12448               7
        4    37006  12448               7
        5    21009   4835               5
        6    24005   4835               5
        7    27001   4835               5
        8    29002  12446               4
        9    30001  12446               4
        10   31002  12446               4
        11   51001  12449               1
        12   51003   4832               1
        13   52004   4836               1
        14   25005  12447               0
        15   29001  12447               0
        16   29002  12447               0
        17   47007   4834               0
        18   49002   4834               0
        19   47004  12445               0

I am trying this code, but it doesn't work with my real dataframe:

for (i in unique(df$Day)) {
    temp <- df$Count[df$Day == i]  
    if(length(temp > 0)) {  
    condition1 <- df$Day == i - 1   
    if (any(condition1)) {
       df$Count[df$Day == i] <- mean(df$Count[condition1]) + df$Count[df$Day == i]
       df$Count[condition1] <- 0
            }
         }
}

The code seems right and it has sense but my output is not.

Can anyone helps me?


@aichao code works good.

In the case that I want to consider the previous 30 days (ie day-30, day-29, day-28, ...., day-1, day0) is there any quick way to do it, instead of creating 30 if statements (conditions)?

Thanks again @aichao for your help.

The following does what you want on the sample data you gave

for (i in unique(df$Day)) {
  temp <- df$Count[df$Day == i]
  if (any(temp > 0)) {
    condition1 <- df$Day == i - 1
    condition1[which(df$Day == i - 1) < max(which(df$Day == i))] <- FALSE
    if (any(condition1)) {
      df$Count[df$Day == i] <- mean(df$Count[condition1]) + df$Count[df$Day == i]
      df$Count[condition1] <- 0
    }
  }
}
print(df[order(df$Count, decreasing = TRUE),])
##   Station   Day Count
##1    33012 12448     7
##2    35004 12448     7
##3    35008 12448     7
##4    37006 12448     7
##5    21009  4835     5
##6    24005  4835     5
##7    27001  4835     5
##11   29002 12446     4
##12   30001 12446     4
##13   31002 12446     4
##17   51001 12449     1
##18   51003  4832     1
##19   52004  4836     1
##8    25005 12447     0
##9    29001 12447     0
##10   29002 12447     0
##14   47007  4834     0
##15   49002  4834     0
##16   47004 12445     0

A key requirement gleamed from your comment that was missing from your implementation is that only days that are further down the data frame (in rows) are considered in determining the previous day and its count. That is, you are processing the data frame rows as if they were ordered in time and not considering the values in the Day column as an ordering of time. Therefore, for df$Day = 12449 there is no previous day to consider since all rows with df$Day = 12448 precedes it. As a result, the Count for df$Day = 12449 remains at 1 , and more importantly, the Counts for all rows that have df$Day = 12448 are not to be zeroed out after processing df$Day = 12449 .

To implement this, we need to further filter condition1 so that we set to FALSE all rows for which df$Day == i - 1 (previous day) that precedes the highest row for which df$Day == i (day of interest) using the line

condition1[which(df$Day == i - 1) < max(which(df$Day == i))] <- FALSE

Note that this solution assumes that same values for the Day column in the data frame are lumped together as blocks of rows as is in your sample data. Otherwise, your for loop over unique(df$Day) needs to be reconsidered completely and replaced with a loop over rows in order to track the current row for the day of interest in the data frame.

In addition, a minor bug in your code was in the line

if(length(temp > 0)) {

The intent was to check if there are any rows for which the Count is greater than 0 for the day of interest. However, conditional operators in R are vectorized such that temp > 0 returns a vector of booleans that is the same length as its input temp . Therefore, length(temp > 0) will always return a positive number unless temp itself is of length 0 (ie, empty). To get what you intend, the line is changed to

if(any(temp > 0)) {

Update: new requirement regarding multiple previous days

The simplest way to address the new requirement is to put the body of code within the if (any(temp > 0)) {...} block into a function, call it accumulate.mean.count , and apply this function over a collection of previous days using sapply . The modifications are:

accumulate.mean.count <- function(this.day, lag) {
  condition1 <- df$Day == this.day - lag
  condition1[which(df$Day == this.day - lag) < max(which(df$Day == this.day))] <- FALSE
  if (any(condition1)) {
    df$Count[df$Day == this.day] <<- mean(df$Count[condition1]) + df$Count[df$Day == this.day]
    df$Count[condition1] <<- 0
  }
}

lags <- seq_len(30)

for (i in unique(df$Day)) {
  temp <- df$Count[df$Day == i]
  if (any(temp > 0)) {
    sapply(lags, accumulate.mean.count, this.day=i)
  }
}

print(df[order(df$Count, decreasing = TRUE),])

Notes:

  1. lag is the number of days previous to (ie, that lag) the current day. A lag = 1 means the previous day, and a lag = 2 means two days previous, etc. lags is a collection of these. Here, lags <- seq_len(30) is a sequence from 1 to 30 over which accumulate.mean.count is applied, which is what you want. See this for an excellent overview on the *apply family of R functions. Note that lags need not be a sequence but just a collection of integers such as c(1, 5, 10) for the previous day, 5 days previous and 10 days previous. It does not even have to be positive if you want to roll in future days, but should not be zero.

  2. Because of the lexical scoping rule of R , setting df$Count , which is a variable outside the scope of accumulate.mean.count , within the function accumulate.mean.count requires <<- instead of <- . See this for an explanation and note the dangers of using <<- mentioned there.

I do not have enough data to test lags <- seq_len(30) , but for seq_len(1) , I recovered the original result, and for seq_len(2) , I got

##   Station   Day Count
##1    33012 12448    10
##2    35004 12448    10
##3    35008 12448    10
##4    37006 12448    10
##5    21009  4835     5
##6    24005  4835     5
##7    27001  4835     5
##16   47004 12445     1
##17   51001 12449     1
##18   51003  4832     1
##19   52004  4836     1
##8    25005 12447     0
##9    29001 12447     0
##10   29002 12447     0
##11   29002 12446     0
##12   30001 12446     0
##13   31002 12446     0
##14   47007  4834     0
##15   49002  4834     0

which I believe is what you would want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM