简体   繁体   English

如何基于观察组的另一个变量为观察组创建一个新变量

[英]How do I create a new variable for a group of observations based on another variable specific to that group

I'm trying add a new variable that is based on the observation for one level of a factor within a groups in my dataset. 我正在尝试添加一个新变量,该变量基于对我的数据集中组中某个因素水平的观察。 I've been trying to utilize various dplyr functions ( filter , select , mutate , group_by ) but can't figure out how to get them to work together and accomplish my goal. 我一直在尝试利用各种dplyr函数( filterselectmutategroup_by ),但无法弄清楚如何使它们协同工作并实现我的目标。

here is a sample of my data: 这是我的数据样本:

  rep   rate       n  mort   avg
   <fct> <fct>  <int> <dbl> <dbl>
 1 1     0.747     10     7   0.7
 2 1     0.373     10     7   0.7
 3 1     0.187     10     6   0.6
 4 1     0.0933    10     0   0  
 5 1     0.00      10     1   0.1
 6 2     0.747     10     7   0.7
 7 2     0.373     10     5   0.5
 8 2     0.187     10     1   0.1
 9 2     0.0933    10     4   0.4
10 2     0.00      10     0   0  

What I'm hoping to accomplish is to create a new variable called cont that is derived from the avg variable when rate == "0.00" . 我希望完成的工作是创建一个名为cont的新变量,该变量是从rate == "0.00"时从avg变量派生的。 This variable would be the same for each observation within the same rep group. 对于同一rep组中的每个观察,此变量将是相同的。 The final product would be a table similar to the one below: 最终产品将是与以下表格相似的表格:

  rep   rate       n  mort   avg  cont
   <fct> <fct>  <int> <dbl> <dbl> <dbl>
 1 1     0.747     10     7   0.7  0.1
 2 1     0.373     10     7   0.7  0.1
 3 1     0.187     10     6   0.6  0.1
 4 1     0.0933    10     0   0    0.1
 5 1     0.00      10     1   0.1  0.1
 6 2     0.747     10     7   0.7  0
 7 2     0.373     10     5   0.5  0
 8 2     0.187     10     1   0.1  0
 9 2     0.0933    10     4   0.4  0
10 2     0.00      10     0   0    0

I've tried the following code: data %>% group_by(rep) %>% filter(rate =="0.00") %>% select(avg) which results in a dataframe with the data that I do want added as the new variable: 我试过下面的代码: data %>% group_by(rep) %>% filter(rate =="0.00") %>% select(avg) ,这将导致一个数据帧包含我想要添加为的数据新变量:

  rep     avg
  <fct> <dbl>
1 1       0.1
2 2       0  
3 3       0.1
4 4       0.3
5 5       0  
6 6       0  
7 7       0  
8 8       0  

My problem now is that I have no idea how to create the new variable for each observation within the rep group. 我现在的问题是我不知道如何为rep组中的每个观察值创建新变量。 I'm not sure how to use mutate properly in this situation. 我不确定在这种情况下如何正确使用mutate Thank you in advance for any help! 预先感谢您的任何帮助!

Assuming there would be only one occurrence of rate == "0.00" in each group, we can do 假设每个组中仅出现一次rate == "0.00" ,我们可以

library(dplyr)
df %>%
   group_by(rep) %>%
   mutate(cont = avg[rate == "0.00"])

#   rep   rate       n  mort   avg  cont
#  <fct> <fct>  <int> <dbl> <dbl> <dbl>
# 1 1     0.747     10     7   0.7   0.1
# 2 1     0.373     10     7   0.7   0.1
# 3 1     0.187     10     6   0.6   0.1
# 4 1     0.0933    10     0   0     0.1
# 5 1     0.00      10     1   0.1   0.1
# 6 2     0.747     10     7   0.7   0  
# 7 2     0.373     10     5   0.5   0  
# 8 2     0.187     10     1   0.1   0  
# 9 2     0.0933    10     4   0.4   0  
#10 2     0.00      10     0   0     0  

If there are more than one occurrence, we can use which.max to select the first one 如果出现多个,我们可以使用which.max选择第一个

df %>% group_by(rep) %>% mutate(cont = avg[which.max(rate == "0.00")])

Using data.table , we can do 使用data.table ,我们可以做

library(data.table)
setDT(df)[, cont := avg[rate == "0.00"], by = rep]

data 数据

df <- structure(list(rep = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), rate = structure(c(5L, 
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0.00", "0.0933", 
"0.187", "0.373", "0.747"), class = "factor"), n = c(10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mort = c(7, 7, 6, 0, 
1, 7, 5, 1, 4, 0), avg = c(0.7, 0.7, 0.6, 0, 0.1, 0.7, 0.5, 0.1, 
0.4, 0)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10"), class = "data.frame")

We can use match 我们可以使用match

library(dplyr)
df  %>%
   group_by(rep) %>% 
   mutate(cont = avg[match("0.00", rate)])
# A tibble: 10 x 6
# Groups:   rep [2]
#   rep   rate       n  mort   avg  cont
#   <fct> <fct>  <int> <dbl> <dbl> <dbl>
# 1 1     0.747     10     7   0.7   0.1
# 2 1     0.373     10     7   0.7   0.1
# 3 1     0.187     10     6   0.6   0.1
# 4 1     0.0933    10     0   0     0.1
# 5 1     0.00      10     1   0.1   0.1
# 6 2     0.747     10     7   0.7   0  
# 7 2     0.373     10     5   0.5   0  
# 8 2     0.187     10     1   0.1   0  
# 9 2     0.0933    10     4   0.4   0  
#10 2     0.00      10     0   0     0  

Or with data.table 或与data.table

library(data.table)
setDT(df)[, cont := avg[match("0.00", rate)], rep]

Or using the join as @thelatemail suggested 或使用@thelatemail建议的联接

setDT(df)[df[rate=="0.00"], on= .(rep), cont := i.avg]

Note; 注意; Both the methods would work even if there are duplicate values as match returns only the index of the first match. 即使存在重复的值,这两种方法也都可以工作,因为match仅返回第一个匹配项的索引。

data 数据

df <- structure(list(rep = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), rate = structure(c(5L, 
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0.00", "0.0933", 
"0.187", "0.373", "0.747"), class = "factor"), n = c(10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mort = c(7, 7, 6, 0, 
1, 7, 5, 1, 4, 0), avg = c(0.7, 0.7, 0.6, 0, 0.1, 0.7, 0.5, 0.1, 
0.4, 0)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10"), class = "data.frame")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为组内的所有观察值创建一个新变量,该变量等于另一个变量值,条件是第三个变量值 - Create a new variable for all observations within a group that equals another variable value conditional on a thirs variable value dplyr,如何根据代码对观察结果进行分组、计数和创建汇总变量,然后根据组内名称添加新变量 - dplyr, how to group observations based on codes, count and create summary variable then add a new variable based on names within the groups 如何根据变量的第一个字母对新变量进行分组? - How do I group a new variable based on a variables first letter? 作为现有数据集中的新变量的组中观察数 - Number of observations in a group as a new variable in existing dataset 根据条件值按组创建新变量 - Create a new variable by group based on conditional values 如何根据组值的总和创建和添加新变量 - How to create and add a new variable based on the sum of values of a group 我想根据样本 ID 中具有特定字母的观察结果在我的数据集中创建一个新变量 - I have want to create a new variable in my dataset based on observations that have specific letters in the Sample ID 如何使用新组的总和创建新观察? - How to create new observations with sum of a new group? 如何创建一组伪变量来标识另一个变量中的特定值? - How to create a group of dummy variables that identifies a specific value in another variable? 按组创建新变量Y:如果X在组的前四个观测值之中,则Y = X; 否则Y = 0 - Create new variable Y by group: Y=X, if X is among top four observations in group; else Y=0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM