在多个条件上使用匹配以在 R 中生成值

Question

I currently have the following data format:我目前有以下数据格式：

df = data.frame(c(rep("A", 12), rep("B", 12)), rep(1:12, 2), seq(-12, 11))
colnames(df) = c("station", "month", "mean")
df

df_master = data.frame(c(rep("A", 10), rep("B", 10)), rep(c(27:31, 1:5), 2), rep(c(rep(1, 5), rep(2, 5)), 2), rep(seq(-4,5), 2))
colnames(df_master) = c("station", "day", "month", "value")
df_master

Effectively df is a monthly average value for each station and I want to compute a new variable in the df_master data set which computes the difference from the monthly mean for each daily observation.实际上 df 是每个站点的月平均值，我想在 df_master 数据集中计算一个新变量，该变量计算每个每日观察的月平均值的差异。 I have managed to do this with an overall average incuding all the data, but since the mean values vary from each station so I would like to make the new variable station specific.我已经设法用包含所有数据的整体平均值来做到这一点，但由于每个站的平均值不同，所以我想使新的变量站特定。

I have tried the following code to match the monthly value, but this currently doesn't account for cross station differences:我已尝试使用以下代码来匹配月度值，但这目前并未考虑跨站差异：

df_master$mean = df$mean[match(df_master$month, df$month)]
df_master = df_master %>% mutate(diff = value - mean)

How can I progress this further so that the averages are taken per station?我怎样才能进一步推进这项工作，以便对每个站点取平均值？

Answer 1

If you convert them to data.tables, you can add the difference column with an update join, joining df_master with df on the condition that the values for both station and month are equal.如果将它们转换为 data.tables，则可以添加带有更新df_master的差异列，在station和month的值相等的情况下将df_master与df df_master 。

library(data.table)
setDT(df_master)
setDT(df)

df_master[df, on = .(station, month), 
          diff_monthmean := value - i.mean]

df_master
#     station day month value diff_monthmean
#  1:       A  27     1    -4              8
#  2:       A  28     1    -3              9
#  3:       A  29     1    -2             10
#  4:       A  30     1    -1             11
#  5:       A  31     1     0             12
#  6:       A   1     2     1             12
#  7:       A   2     2     2             13
#  8:       A   3     2     3             14
#  9:       A   4     2     4             15
# 10:       A   5     2     5             16
# 11:       B  27     1    -4             -4
# 12:       B  28     1    -3             -3
# 13:       B  29     1    -2             -2
# 14:       B  30     1    -1             -1
# 15:       B  31     1     0              0
# 16:       B   1     2     1              0
# 17:       B   2     2     2              1
# 18:       B   3     2     3              2
# 19:       B   4     2     4              3
# 20:       B   5     2     5              4

Answer 2

With dplyr using a left join使用左连接的dplyr

library(dplyr)
left_join(df_master, df, by = c('station', 'month')) %>% 
        mutate(monthdiff  = value - mean) %>%
        select(-mean)

Answer 3

Another option could be:另一种选择可能是：

transform(df_master, 
          diff = value - merge(df_master, df, by = c('station', 'month'), all.x = TRUE)$mean)

Or, using match with interaction或者，使用match与interaction

transform(df_master, 
diff = value - df$mean[match(interaction(df_master[c("month", "station")]), interaction(df[c("month", "station")]))])

在多个条件上使用匹配以在 R 中生成值

问题描述

3 个解决方案

解决方案1
2 2020-01-06 16:24:08

解决方案2
2 已采纳 2020-01-06 16:25:04

解决方案3
1 2020-01-06 16:30:50

在多个条件上使用匹配以在 R 中生成值

问题描述

3 个解决方案

解决方案1 2 2020-01-06 16:24:08

解决方案2 2 已采纳 2020-01-06 16:25:04

解决方案3 1 2020-01-06 16:30:50

解决方案1
2 2020-01-06 16:24:08

解决方案2
2 已采纳 2020-01-06 16:25:04

解决方案3
1 2020-01-06 16:30:50