如何根据 R 中多个列的信息创建新列？

Question

I have a data frame data that looks like the below:我有一个如下所示的数据框data ：

data <- structure(list(ID = c("POTR_001341", "POTR_001341", "POTR_156376", 
"POTR_001106", "POTR_001178", "POTR_001178", "POTR_234156", "POTR_234156", 
"POTR_003709", "POTR_003709", "POTR_006406", "POTR_006406", "POTR_233767", 
"POTR_233767"), label = c("non", "co", "co", "non", "non", "co", 
"co", "non", "co", "non", "non", "co", "co", "non"), num = c(20, 
2, 16, 8, 1, 1, 8, 2, 3, 3, 25, 3, 7, 7)), row.names = c(NA, 
-14L), class = "data.frame")

I want to create a 4th column based on information present in the three columns.我想根据三列中的信息创建第四列。

If the ID is a duplicate like POTR_001341 check the label which has bigger num and give its label in the new_column and other as empty.如果ID是重复的，如POTR_001341 ，请检查具有更大num的label并将其label在new_column和其他为空。 It should look like below:它应该如下所示：

If the ID is not a duplicate, give the label in the new_column .如果ID不重复，则在label中给出new_column 。

If the ID has a duplicate and both the labels non and co have the same digits in the column num , then give common in the new_column .如果ID有重复，并且标签non和co在num列中具有相同的数字，则在new_column中给出common 。 It should look like它应该看起来像

So, the final output should look like:因此，最终的 output 应如下所示：

Answer 1

We may use我们可能会使用

library(dplyr)
data %>% 
  group_by(ID) %>% 
  mutate(label = reorder(label, num),  
   new_column = if(n_distinct(num) == 1 & n_distinct(label) > 1) 'common' 
     else first(label), 
   new_column = replace(new_column, duplicated(new_column) & 
       new_column != 'common', "")) %>% 
  ungroup

-output -输出

# A tibble: 14 × 4
   ID          label   num new_column
   <chr>       <fct> <dbl> <chr>     
 1 POTR_001341 non      20 "non"     
 2 POTR_001341 co        2 ""        
 3 POTR_156376 co       16 "co"      
 4 POTR_001106 non       8 "non"     
 5 POTR_001178 non       1 "common"  
 6 POTR_001178 co        1 "common"  
 7 POTR_234156 co        8 "co"      
 8 POTR_234156 non       2 ""        
 9 POTR_003709 co        3 "common"  
10 POTR_003709 non       3 "common"  
11 POTR_006406 non      25 "non"     
12 POTR_006406 co        3 ""        
13 POTR_233767 co        7 "common"  
14 POTR_233767 non       7 "common"

Answer 2

data <- structure(list(ID = c("POTR_001341", "POTR_001341", "POTR_156376", 
"POTR_001106", "POTR_001178", "POTR_001178", "POTR_234156", "POTR_234156", 
"POTR_003709", "POTR_003709", "POTR_006406", "POTR_006406", "POTR_233767", 
"POTR_233767"), label = c("non", "co", "co", "non", "non", "co", 
"co", "non", "co", "non", "non", "co", "co", "non"), num = c(20, 
2, 16, 8, 1, 1, 8, 2, 3, 3, 25, 3, 7, 7)), row.names = c(NA, 
-14L), class = "data.frame")

library(dplyr, warn = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2

data %>% 
  group_by(ID) %>% 
  mutate(new_column = case_when(n() > 1 && all(num == first(num)) ~ 'common', 
                                num == max(num) ~ label, 
                                TRUE ~ ''))
#> # A tibble: 14 × 4
#> # Groups:   ID [8]
#>    ID          label   num new_column
#>    <chr>       <chr> <dbl> <chr>     
#>  1 POTR_001341 non      20 "non"     
#>  2 POTR_001341 co        2 ""        
#>  3 POTR_156376 co       16 "co"      
#>  4 POTR_001106 non       8 "non"     
#>  5 POTR_001178 non       1 "common"  
#>  6 POTR_001178 co        1 "common"  
#>  7 POTR_234156 co        8 "co"      
#>  8 POTR_234156 non       2 ""        
#>  9 POTR_003709 co        3 "common"  
#> 10 POTR_003709 non       3 "common"  
#> 11 POTR_006406 non      25 "non"     
#> 12 POTR_006406 co        3 ""        
#> 13 POTR_233767 co        7 "common"  
#> 14 POTR_233767 non       7 "common"

^{Created on 2022-08-22 by the reprex package (v2.0.1.9000)}^{由代表 package (v2.0.1.9000) 于 2022 年 8 月 22 日创建}

如何根据 R 中多个列的信息创建新列？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-08-22 18:49:24

解决方案2
1 2022-08-22 18:53:10

如何根据 R 中多个列的信息创建新列？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-08-22 18:49:24

解决方案2 1 2022-08-22 18:53:10

解决方案1
1 已采纳 2022-08-22 18:49:24

解决方案2
1 2022-08-22 18:53:10