简体   繁体   中英

Assign value for each row in dataframe based on row values in R

I am trying to calculate the mean of a column based on a subset of the dataframe till a specific date. I have created a dataframe containing all the dates for which I want to calculate the mean up to that date.

For example I have a dataframe containing:

> df
      date  value
2019-01-01      4
2019-01-02      2
2019-01-02      3
2019-01-03      7

and a dataframe containing the dates:

> a

      date   

2019-01-01   
2019-01-02   
2019-01-03 

I would like to get mean till that date based on the value in df.

> a

      date  mean

2019-01-01     4 
2019-01-02     3   
2019-01-03     4

I tried simply

calculate_mean <- function(input) {
  sub <- subset(df, date < input)
  return(mean(sub$value))
}
a$mean <- calculate_mean(a$date)

Instead of input being the single date of that row it is the whole list of dates in a . Therefor the mean value is the same for each row. How can I pass just the single date for that row.

For now I have solved it with a dirty for loop, which I believe is not supposed to be the solution.

An option is non-equi join with data.table

library(data.table)
setDT(df)[a, .(mean = mean(value)), on = .(date <= date), by = .EACHI]
#          date mean
#1: 2019-01-01    4
#2: 2019-01-02    3
#3: 2019-01-03    4

data

df <- structure(list(date = structure(c(17897, 17898, 17898, 17899), class = "Date"), 
    value = c(4L, 2L, 3L, 7L)), class = "data.frame", row.names = c(NA, 
-4L))

a <- structure(list(date = structure(c(17897, 17898, 17899), class = "Date")), row.names = c(NA, 
-3L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM