[英]How to create a new column in a dataframe based on grouped permutations of another column
I have a data frame like this: 我有一个像这样的数据框:
df <- data.frame(grouping = c(rep("site1_1",9), rep("site2_1",9)),
var = c(rep("P", 3), rep("G", 3), rep("B",3),rep("P", 3), rep("B", 3), rep("G",3)),
order= c(rep(0, 3), rep(1, 3), rep(2,3),rep(0, 3), rep(1, 3), rep(2,3)))
grouping var order
1 site1_1 P 0
2 site1_1 P 0
3 site1_1 P 0
4 site1_1 G 1
5 site1_1 G 1
6 site1_1 G 1
7 site1_1 B 2
8 site1_1 B 2
9 site1_1 B 2
10 site2_1 P 0
11 site2_1 P 0
12 site2_1 P 0
13 site2_1 B 1
14 site2_1 B 1
15 site2_1 B 1
16 site2_1 G 2
17 site2_1 G 2
18 site2_1 G 2
I have a column that represents a unique ID called Grouping (never repeated). 我有一列代表唯一的ID,称为分组(不再重复)。 Within each grouping I have 3 variables (P,G, and B). 在每个分组中,我有3个变量(P,G和B)。 They do get repeated multiple times within a grouping as shown above. 它们确实在分组中多次重复,如上所示。
The order is always P,G,B or P,B,G. 顺序始终为P,G,B或P,B,G。 Within a group P is always 0, B/G are either 1 or 2. They can never be 1 and 2 within a group and between groups it's random as to whether B or G is 1 or 2. 在一个组中,P始终为0,B / G始终为1或2。它们在一个组中永远不能为1和2,并且在组之间,B或G是1还是2是随机的。
I have an order column to show the order that var takes within each group 我有一个订单栏来显示var在每个组中的顺序
I would like to add a new column that labels the entire grouping (P,B, and G) based on whether B precedes G or vice versa. 我想添加一个新列,根据B是G之前还是B之前的标签来标记整个分组(P,B和G)。
This is an example of what that would look like: 这是一个如下所示的示例:
grouping var order label
1 site1_1 P 0 Gfirst
2 site1_1 P 0 Gfirst
3 site1_1 P 0 Gfirst
4 site1_1 G 1 Gfirst
5 site1_1 G 1 Gfirst
6 site1_1 G 1 Gfirst
7 site1_1 B 2 Gfirst
8 site1_1 B 2 Gfirst
9 site1_1 B 2 Gfirst
10 site2_1 P 0 Bfirst
11 site2_1 P 0 Bfirst
12 site2_1 P 0 Bfirst
13 site2_1 B 1 Bfirst
14 site2_1 B 1 Bfirst
15 site2_1 B 1 Bfirst
16 site2_1 G 2 Bfirst
17 site2_1 G 2 Bfirst
18 site2_1 G 2 Bfirst
I am unclear as to how to do this. 我不清楚如何执行此操作。
Using dplyr I start by 使用dplyr我首先开始
df %>% group_by(grouping) %>% mutate(label = ....... df%>%group_by(grouping)%>%mutate(标签= .......
But after here I'm lost as to how to specify that the label is conditional on the order of the sequence of P,B, and G and how to account for the fact that they repeat multiple times in each group. 但是在此之后,我迷失了如何指定标签以P,B和G的顺序为条件,以及如何解释它们在每个组中重复多次的事实。
I went to this exchange: 我去了这个交易所:
[ How can I create a new column in a dataframe based on permutations of other columns? [ 如何根据其他列的排列在数据框中创建新列?
but am unclear how to adopt the answers given that I need to group them by the groupings column and need to account for differing numbers of permutations within each variable (there can be a range from 3-15 P's, B's, and G's in each grouping. 但由于我需要按分组列对它们进行分组,并且需要考虑每个变量中排列的不同数量(每个分组中可以包含3-15个P,B和G,因此),因此尚不清楚如何采用答案。
Any help is greatly appreciated. 任何帮助是极大的赞赏。
library(tidyverse)
df %>%
group_by(grouping) %>%
mutate(label = paste0(substr(gsub("[^G|B]", "", paste(unique(var), collapse = "")), 1, 1), "first"))
One solution using dplyr
and ifelse
can be achieved as: 使用dplyr
和ifelse
一种解决方案可以实现为:
library(dplyr)
df %>% group_by(grouping) %>%
mutate(label = ifelse(var[var!="P"][1] == "B","BFirst","GFirst" )) %>%
as.data.frame()
# grouping var order label
# 1 site1_1 P 0 GFirst
# 2 site1_1 P 0 GFirst
# 3 site1_1 P 0 GFirst
# 4 site1_1 G 1 GFirst
# 5 site1_1 G 1 GFirst
# 6 site1_1 G 1 GFirst
# 7 site1_1 B 2 GFirst
# 8 site1_1 B 2 GFirst
# 9 site1_1 B 2 GFirst
# 10 site2_1 P 0 BFirst
# 11 site2_1 P 0 BFirst
# 12 site2_1 P 0 BFirst
# 13 site2_1 B 1 BFirst
# 14 site2_1 B 1 BFirst
# 15 site2_1 B 1 BFirst
# 16 site2_1 G 2 BFirst
# 17 site2_1 G 2 BFirst
# 18 site2_1 G 2 BFirst
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.