[英]How to create a new column based on information from multiple columns in R?
I have a data frame data
that looks like the below:我有一个如下所示的数据框
data
:
data <- structure(list(ID = c("POTR_001341", "POTR_001341", "POTR_156376",
"POTR_001106", "POTR_001178", "POTR_001178", "POTR_234156", "POTR_234156",
"POTR_003709", "POTR_003709", "POTR_006406", "POTR_006406", "POTR_233767",
"POTR_233767"), label = c("non", "co", "co", "non", "non", "co",
"co", "non", "co", "non", "non", "co", "co", "non"), num = c(20,
2, 16, 8, 1, 1, 8, 2, 3, 3, 25, 3, 7, 7)), row.names = c(NA,
-14L), class = "data.frame")
I want to create a 4th column based on information present in the three columns.我想根据三列中的信息创建第四列。
If the ID
is a duplicate like POTR_001341
check the label
which has bigger num
and give its label
in the new_column
and other as empty.如果
ID
是重复的,如POTR_001341
,请检查具有更大num
的label
并将其label
在new_column
和其他为空。 It should look like below:它应该如下所示:
If the ID
is not a duplicate, give the label
in the new_column
.如果
ID
不重复,则在label
中给出new_column
。
If the ID
has a duplicate and both the labels non
and co
have the same digits in the column num
, then give common
in the new_column
.如果
ID
有重复,并且标签non
和co
在num
列中具有相同的数字,则在new_column
中给出common
。 It should look like它应该看起来像
So, the final output should look like:因此,最终的 output 应如下所示:
We may use我们可能会使用
library(dplyr)
data %>%
group_by(ID) %>%
mutate(label = reorder(label, num),
new_column = if(n_distinct(num) == 1 & n_distinct(label) > 1) 'common'
else first(label),
new_column = replace(new_column, duplicated(new_column) &
new_column != 'common', "")) %>%
ungroup
-output -输出
# A tibble: 14 × 4
ID label num new_column
<chr> <fct> <dbl> <chr>
1 POTR_001341 non 20 "non"
2 POTR_001341 co 2 ""
3 POTR_156376 co 16 "co"
4 POTR_001106 non 8 "non"
5 POTR_001178 non 1 "common"
6 POTR_001178 co 1 "common"
7 POTR_234156 co 8 "co"
8 POTR_234156 non 2 ""
9 POTR_003709 co 3 "common"
10 POTR_003709 non 3 "common"
11 POTR_006406 non 25 "non"
12 POTR_006406 co 3 ""
13 POTR_233767 co 7 "common"
14 POTR_233767 non 7 "common"
data <- structure(list(ID = c("POTR_001341", "POTR_001341", "POTR_156376",
"POTR_001106", "POTR_001178", "POTR_001178", "POTR_234156", "POTR_234156",
"POTR_003709", "POTR_003709", "POTR_006406", "POTR_006406", "POTR_233767",
"POTR_233767"), label = c("non", "co", "co", "non", "non", "co",
"co", "non", "co", "non", "non", "co", "co", "non"), num = c(20,
2, 16, 8, 1, 1, 8, 2, 3, 3, 25, 3, 7, 7)), row.names = c(NA,
-14L), class = "data.frame")
library(dplyr, warn = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
data %>%
group_by(ID) %>%
mutate(new_column = case_when(n() > 1 && all(num == first(num)) ~ 'common',
num == max(num) ~ label,
TRUE ~ ''))
#> # A tibble: 14 × 4
#> # Groups: ID [8]
#> ID label num new_column
#> <chr> <chr> <dbl> <chr>
#> 1 POTR_001341 non 20 "non"
#> 2 POTR_001341 co 2 ""
#> 3 POTR_156376 co 16 "co"
#> 4 POTR_001106 non 8 "non"
#> 5 POTR_001178 non 1 "common"
#> 6 POTR_001178 co 1 "common"
#> 7 POTR_234156 co 8 "co"
#> 8 POTR_234156 non 2 ""
#> 9 POTR_003709 co 3 "common"
#> 10 POTR_003709 non 3 "common"
#> 11 POTR_006406 non 25 "non"
#> 12 POTR_006406 co 3 ""
#> 13 POTR_233767 co 7 "common"
#> 14 POTR_233767 non 7 "common"
Created on 2022-08-22 by the reprex package (v2.0.1.9000)由代表 package (v2.0.1.9000) 于 2022 年 8 月 22 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.