简体   繁体   English

如何根据 R 中多个列的信息创建新列?

[英]How to create a new column based on information from multiple columns in R?

I have a data frame data that looks like the below:我有一个如下所示的数据框data

data <- structure(list(ID = c("POTR_001341", "POTR_001341", "POTR_156376", 
"POTR_001106", "POTR_001178", "POTR_001178", "POTR_234156", "POTR_234156", 
"POTR_003709", "POTR_003709", "POTR_006406", "POTR_006406", "POTR_233767", 
"POTR_233767"), label = c("non", "co", "co", "non", "non", "co", 
"co", "non", "co", "non", "non", "co", "co", "non"), num = c(20, 
2, 16, 8, 1, 1, 8, 2, 3, 3, 25, 3, 7, 7)), row.names = c(NA, 
-14L), class = "data.frame")

在此处输入图像描述

I want to create a 4th column based on information present in the three columns.我想根据三列中的信息创建第四列。

If the ID is a duplicate like POTR_001341 check the label which has bigger num and give its label in the new_column and other as empty.如果ID是重复的,如POTR_001341 ,请检查具有更大numlabel并将其labelnew_column和其他为空。 It should look like below:它应该如下所示:

在此处输入图像描述

If the ID is not a duplicate, give the label in the new_column .如果ID不重复,则在label中给出new_column

在此处输入图像描述

If the ID has a duplicate and both the labels non and co have the same digits in the column num , then give common in the new_column .如果ID有重复,并且标签nonconum列中具有相同的数字,则在new_column中给出common It should look like它应该看起来像

在此处输入图像描述

So, the final output should look like:因此,最终的 output 应如下所示:

在此处输入图像描述

We may use我们可能会使用

library(dplyr)
data %>% 
  group_by(ID) %>% 
  mutate(label = reorder(label, num),  
   new_column = if(n_distinct(num) == 1 & n_distinct(label) > 1) 'common' 
     else first(label), 
   new_column = replace(new_column, duplicated(new_column) & 
       new_column != 'common', "")) %>% 
  ungroup

-output -输出

# A tibble: 14 × 4
   ID          label   num new_column
   <chr>       <fct> <dbl> <chr>     
 1 POTR_001341 non      20 "non"     
 2 POTR_001341 co        2 ""        
 3 POTR_156376 co       16 "co"      
 4 POTR_001106 non       8 "non"     
 5 POTR_001178 non       1 "common"  
 6 POTR_001178 co        1 "common"  
 7 POTR_234156 co        8 "co"      
 8 POTR_234156 non       2 ""        
 9 POTR_003709 co        3 "common"  
10 POTR_003709 non       3 "common"  
11 POTR_006406 non      25 "non"     
12 POTR_006406 co        3 ""        
13 POTR_233767 co        7 "common"  
14 POTR_233767 non       7 "common"  
data <- structure(list(ID = c("POTR_001341", "POTR_001341", "POTR_156376", 
"POTR_001106", "POTR_001178", "POTR_001178", "POTR_234156", "POTR_234156", 
"POTR_003709", "POTR_003709", "POTR_006406", "POTR_006406", "POTR_233767", 
"POTR_233767"), label = c("non", "co", "co", "non", "non", "co", 
"co", "non", "co", "non", "non", "co", "co", "non"), num = c(20, 
2, 16, 8, 1, 1, 8, 2, 3, 3, 25, 3, 7, 7)), row.names = c(NA, 
-14L), class = "data.frame")

library(dplyr, warn = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2

data %>% 
  group_by(ID) %>% 
  mutate(new_column = case_when(n() > 1 && all(num == first(num)) ~ 'common', 
                                num == max(num) ~ label, 
                                TRUE ~ ''))
#> # A tibble: 14 × 4
#> # Groups:   ID [8]
#>    ID          label   num new_column
#>    <chr>       <chr> <dbl> <chr>     
#>  1 POTR_001341 non      20 "non"     
#>  2 POTR_001341 co        2 ""        
#>  3 POTR_156376 co       16 "co"      
#>  4 POTR_001106 non       8 "non"     
#>  5 POTR_001178 non       1 "common"  
#>  6 POTR_001178 co        1 "common"  
#>  7 POTR_234156 co        8 "co"      
#>  8 POTR_234156 non       2 ""        
#>  9 POTR_003709 co        3 "common"  
#> 10 POTR_003709 non       3 "common"  
#> 11 POTR_006406 non      25 "non"     
#> 12 POTR_006406 co        3 ""        
#> 13 POTR_233767 co        7 "common"  
#> 14 POTR_233767 non       7 "common"

Created on 2022-08-22 by the reprex package (v2.0.1.9000)代表 package (v2.0.1.9000) 于 2022 年 8 月 22 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM