简体   繁体   English

如何根据几种条件计算R中数据的平均值

[英]How to calculate a mean on a datable in R based on several conditions

I have data like the following :我有如下数据:

library(lubridate)
library(dplyr)
library(data.table)
MWE <- data.table(
  Date=rep(seq(ymd("2020-1-1"), ymd("2020-3-30"), by = "days"),each=6),
  Country=rep(c("France","United States","Germany"),90*6),
  TransportType=rep(c("Train","Cars"),each=3,90*3),
  Value=rnorm(90*6,2,3)
  )

I want to create a new variable, that is the mean of value :我想创建一个新变量,即值的均值:

  • By Country and Transport按国家和运输
  • By weekday按工作日
  • based on dates before March (but here for March too)基于三月之前的日期(但这里也是三月)

So the mean should be calculated on January and February, but in the database for the whole period.所以平均值应该在 1 月和 2 月计算,但在整个时期的数据库中。

I have managed to do the first two (or I think so, I am checking) :我已经设法做到了前两个(或者我认为是这样,我正在检查):

MWE_2 <- MWE %>%
  .[,JourSem:=weekdays(Date)] %>%
  .[,Moyenne:=mean(Value),by=.(Country,JourSem,TransportType)]

But I am unsure how to pass another condition in that.但我不确定如何通过另一个条件。 I think I get it form this我想我明白了

MWE_3 <- MWE %>%
  .[,JourSem:=weekdays(Date)] %>%
  .[Date <= "2020-02-29",Moyenne:=mean(Value),by=.(Country,JourSem,TransportType)]

But I lack the value for March dates, which is logical, as they are filtered out, which is therefore not what I want.但是我缺少三月日期的值,这是合乎逻辑的,因为它们被过滤掉了,因此这不是我想要的。

We can first calculate mean for January and February month for each weekday and then join this data with March data.我们可以首先计算每个工作日的 1 月和 2 月的平均值,然后将这些数据与 3 月的数据结合起来。

library(data.table)

MWE[, JourSem:=weekdays(Date)]

d1 <- MWE[Date <= as.Date("2020-02-29")] %>%
        .[, .(Moyenne = mean(Value)), JourSem]

MWE[Date > as.Date("2020-02-29")][d1, on = 'JourSem']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM