如何基于观察组的另一个变量为观察组创建一个新变量

Question

I'm trying add a new variable that is based on the observation for one level of a factor within a groups in my dataset. 我正在尝试添加一个新变量，该变量基于对我的数据集中组中某个因素水平的观察。 I've been trying to utilize various dplyr functions ( filter , select , mutate , group_by ) but can't figure out how to get them to work together and accomplish my goal. 我一直在尝试利用各种dplyr函数（ filter ， select ， mutate ， group_by ），但无法弄清楚如何使它们协同工作并实现我的目标。

here is a sample of my data: 这是我的数据样本：

  rep   rate       n  mort   avg
   <fct> <fct>  <int> <dbl> <dbl>
 1 1     0.747     10     7   0.7
 2 1     0.373     10     7   0.7
 3 1     0.187     10     6   0.6
 4 1     0.0933    10     0   0  
 5 1     0.00      10     1   0.1
 6 2     0.747     10     7   0.7
 7 2     0.373     10     5   0.5
 8 2     0.187     10     1   0.1
 9 2     0.0933    10     4   0.4
10 2     0.00      10     0   0

What I'm hoping to accomplish is to create a new variable called cont that is derived from the avg variable when rate == "0.00" . 我希望完成的工作是创建一个名为cont的新变量，该变量是从rate == "0.00"时从avg变量派生的。 This variable would be the same for each observation within the same rep group. 对于同一rep组中的每个观察，此变量将是相同的。 The final product would be a table similar to the one below: 最终产品将是与以下表格相似的表格：

  rep   rate       n  mort   avg  cont
   <fct> <fct>  <int> <dbl> <dbl> <dbl>
 1 1     0.747     10     7   0.7  0.1
 2 1     0.373     10     7   0.7  0.1
 3 1     0.187     10     6   0.6  0.1
 4 1     0.0933    10     0   0    0.1
 5 1     0.00      10     1   0.1  0.1
 6 2     0.747     10     7   0.7  0
 7 2     0.373     10     5   0.5  0
 8 2     0.187     10     1   0.1  0
 9 2     0.0933    10     4   0.4  0
10 2     0.00      10     0   0    0

I've tried the following code: data %>% group_by(rep) %>% filter(rate =="0.00") %>% select(avg) which results in a dataframe with the data that I do want added as the new variable: 我试过下面的代码： data %>% group_by(rep) %>% filter(rate =="0.00") %>% select(avg) ，这将导致一个数据帧包含我想要添加为的数据新变量：

  rep     avg
  <fct> <dbl>
1 1       0.1
2 2       0  
3 3       0.1
4 4       0.3
5 5       0  
6 6       0  
7 7       0  
8 8       0

My problem now is that I have no idea how to create the new variable for each observation within the rep group. 我现在的问题是我不知道如何为rep组中的每个观察值创建新变量。 I'm not sure how to use mutate properly in this situation. 我不确定在这种情况下如何正确使用mutate 。 Thank you in advance for any help! 预先感谢您的任何帮助！

Answer 1

Assuming there would be only one occurrence of rate == "0.00" in each group, we can do 假设每个组中仅出现一次rate == "0.00" ，我们可以

library(dplyr)
df %>%
   group_by(rep) %>%
   mutate(cont = avg[rate == "0.00"])

#   rep   rate       n  mort   avg  cont
#  <fct> <fct>  <int> <dbl> <dbl> <dbl>
# 1 1     0.747     10     7   0.7   0.1
# 2 1     0.373     10     7   0.7   0.1
# 3 1     0.187     10     6   0.6   0.1
# 4 1     0.0933    10     0   0     0.1
# 5 1     0.00      10     1   0.1   0.1
# 6 2     0.747     10     7   0.7   0  
# 7 2     0.373     10     5   0.5   0  
# 8 2     0.187     10     1   0.1   0  
# 9 2     0.0933    10     4   0.4   0  
#10 2     0.00      10     0   0     0

If there are more than one occurrence, we can use which.max to select the first one 如果出现多个，我们可以使用which.max选择第一个

df %>% group_by(rep) %>% mutate(cont = avg[which.max(rate == "0.00")])

Using data.table , we can do 使用data.table ，我们可以做

library(data.table)
setDT(df)[, cont := avg[rate == "0.00"], by = rep]

data 数据

df <- structure(list(rep = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), rate = structure(c(5L, 
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0.00", "0.0933", 
"0.187", "0.373", "0.747"), class = "factor"), n = c(10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mort = c(7, 7, 6, 0, 
1, 7, 5, 1, 4, 0), avg = c(0.7, 0.7, 0.6, 0, 0.1, 0.7, 0.5, 0.1, 
0.4, 0)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10"), class = "data.frame")

Answer 2

We can use match 我们可以使用match

library(dplyr)
df  %>%
   group_by(rep) %>% 
   mutate(cont = avg[match("0.00", rate)])
# A tibble: 10 x 6
# Groups:   rep [2]
#   rep   rate       n  mort   avg  cont
#   <fct> <fct>  <int> <dbl> <dbl> <dbl>
# 1 1     0.747     10     7   0.7   0.1
# 2 1     0.373     10     7   0.7   0.1
# 3 1     0.187     10     6   0.6   0.1
# 4 1     0.0933    10     0   0     0.1
# 5 1     0.00      10     1   0.1   0.1
# 6 2     0.747     10     7   0.7   0  
# 7 2     0.373     10     5   0.5   0  
# 8 2     0.187     10     1   0.1   0  
# 9 2     0.0933    10     4   0.4   0  
#10 2     0.00      10     0   0     0

Or with data.table 或与data.table

library(data.table)
setDT(df)[, cont := avg[match("0.00", rate)], rep]

Or using the join as @thelatemail suggested 或使用@thelatemail建议的联接

setDT(df)[df[rate=="0.00"], on= .(rep), cont := i.avg]

Note; 注意; Both the methods would work even if there are duplicate values as match returns only the index of the first match. 即使存在重复的值，这两种方法也都可以工作，因为match仅返回第一个匹配项的索引。

data 数据

df <- structure(list(rep = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), rate = structure(c(5L, 
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0.00", "0.0933", 
"0.187", "0.373", "0.747"), class = "factor"), n = c(10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mort = c(7, 7, 6, 0, 
1, 7, 5, 1, 4, 0), avg = c(0.7, 0.7, 0.6, 0, 0.1, 0.7, 0.5, 0.1, 
0.4, 0)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10"), class = "data.frame")

如何基于观察组的另一个变量为观察组创建一个新变量

问题描述

2 个解决方案

解决方案1
4 2019-08-27 02:01:14

解决方案2
1 2019-08-27 03:10:53

data 数据

如何基于观察组的另一个变量为观察组创建一个新变量

问题描述

2 个解决方案

解决方案1 4 2019-08-27 02:01:14

解决方案2 1 2019-08-27 03:10:53

data 数据

解决方案1
4 2019-08-27 02:01:14

解决方案2
1 2019-08-27 03:10:53