简体   繁体   English

使用 R 中的 dplyr 库来“打印”非 NA 列的名称

[英]Using the dplyr library in R to "print" the name of the non-NA columns

Here is my data frame:这是我的数据框:

a <- data.frame(id=c(rep("A",2),rep("B",2)),
                x=c(rep(2,2),rep(3,2)),
                p.ABC= c(1,NA,1,1),
                p.DEF= c(NA,1,NA,NA),
                p.TAR= c(1,NA,1,1),
                p.REP= c(NA,1,1,NA),
                p.FAR= c(NA,NA,1,1))

I Want to create a new character column (using mutate() in the dplyr library in R), which tells (by row) the name of the columns that have a non-NA value (here the non-NA value is always 1).我想创建一个新的字符列(使用 R 中dplyr库中的mutate() ),它告诉(按行)具有非 NA 值的列的名称(这里的非 NA 值始终为 1) . However, it should only search among the columns that start with "p."但是,它应该只在以“p”开头的列中进行搜索。 and it should order the names by alphabetical order and then concatenate them using the expression "_" as a separator.它应该按字母顺序对名称进行排序,然后使用表达式“_”作为分隔符将它们连接起来。 You can find below the desired result, under the column called "name":您可以在名为“名称”的列下找到所需的结果:

data.frame(id=c(rep("A",2),rep("B",2)),
                x=c(rep(2,2),rep(3,2)),
                p.ABC= c(1,NA,1,1),
                p.DEF= c(NA,1,NA,NA),
                p.TAR= c(1,NA,1,1),
                p.REP= c(NA,1,1,NA),
                p.FAR= c(NA,NA,1,1),
                name=c("ABC_TAR","DEF_REP","ABC_FAR_REP_TAR","ABC_FAR_TAR"))

I would like to emphasize that I'm really looking for a solution using dplyr , as I would be able to do it without it (but it doesn't look pretty and it's slow).我想强调一下,我真的在寻找使用dplyr的解决方案,因为没有它我也能做到(但它看起来不漂亮而且速度很慢)。

Here is an option with tidyverse , where we reshape the data into 'long' format with pivot_longer , grouped by row_number() ), paste the column name column 'name' values after removing the prefix part and then bind that column with the original data这是 tidyverse 的一个选项,我们使用tidyverse将数据重塑为“long”格式,按pivot_longer row_number()分组),在删除前缀部分后paste列名列“name”值,然后将该列与原始数据绑定

library(dplyr)
library(stringr)
library(tidyr)
a %>% 
    mutate(rn = row_number()) %>%
    select(-id, -x) %>%
    pivot_longer(cols = -rn, values_drop_na = TRUE) %>%
    group_by(rn) %>%
    summarise(name = str_c(str_remove(name, ".*\\."), collapse="_"), 
         .groups = 'drop') %>%
    select(-rn) %>% 
    bind_cols(a, .)

-output -输出

# id x p.ABC p.DEF p.TAR p.REP p.FAR            name
#1  A 2     1    NA     1    NA    NA         ABC_TAR
#2  A 2    NA     1    NA     1    NA         DEF_REP
#3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

Or use pmap或者使用pmap

library(purrr)
a %>% 
   mutate(name = pmap_chr(select(cur_data(), contains('.')), ~ {
       nm1 <- c(...)
       str_c(str_remove(names(nm1)[!is.na(nm1)], '.*\\.'), collapse="_")}))
#  id x p.ABC p.DEF p.TAR p.REP p.FAR            name
#1  A 2     1    NA     1    NA    NA         ABC_TAR
#2  A 2    NA     1    NA     1    NA         DEF_REP
#3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

Or use apply in base R或者在base R中使用apply

apply(a[-(1:2)], 1, function(x) paste(sub(".*\\.", "", 
        names(x)[!is.na(x)]), collapse="_"))
#[1] "ABC_TAR"         "DEF_REP"         "ABC_TAR_REP_FAR" "ABC_TAR_FAR"    

I think my answer may be similar to others, still I feel syntax is written in tidyverse pipe style so may be easier to understand.我认为我的答案可能与其他人相似,但我仍然觉得语法是用tidyverse pipe 风格编写的,所以可能更容易理解。 Still someone, if feels it is copy of theirs, I will be happy to delete it.还有人,如果觉得它是他们的副本,我会很乐意删除它。

a %>% mutate(name = pmap(select(cur_data(), contains('p')), 
                         ~ names(c(...))[!is.na(c(...))] %>%
                           str_remove_all(., "p.") %>%
                           paste(., collapse = '_')
                         )
             )
  id x p.ABC p.DEF p.TAR p.REP p.FAR            name
1  A 2     1    NA     1    NA    NA         ABC_TAR
2  A 2    NA     1    NA     1    NA         DEF_REP
3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

The idea behind it is actually we can use pipes inside of map/reduce family of functions so as to obviate the necessity of writing a custom function beforehand and also creating intermediate objects inside {}它背后的想法实际上是我们可以在 map/reduce 系列函数中使用管道,从而避免事先编写自定义 function 并在{}中创建中间对象的必要性

Using rowwise :使用rowwise

library(dplyr)

cols <- grep('^p\\.', names(a), value = TRUE)

a %>%
  rowwise() %>%
  mutate(name = paste0(sub('p\\.', '', 
                cols[!is.na(c_across(starts_with('p')))]), collapse = '_')) %>%
  ungroup

#  id        x p.ABC p.DEF p.TAR p.REP p.FAR name           
#  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>          
#1 A         2     1    NA     1    NA    NA ABC_TAR        
#2 A         2    NA     1    NA     1    NA DEF_REP        
#3 B         3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4 B         3     1    NA     1    NA     1 ABC_TAR_FAR                     

Updated Special thanks to dear @akrun for helping me improve my codes: We just made a subtle modification to suppress a message produced by unnest_wider .更新特别感谢亲爱的@akrun 帮助我改进我的代码:我们只是做了一个微妙的修改来抑制unnest_wider产生的消息。

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)

a %>%
  mutate(name = pmap(select(a, starts_with("p.")), ~ {nm1 <- names(c(...))[!is.na(c(...))]; 
  setNames(nm1, seq_along(nm1))})) %>%
  unnest_wider(name) %>%
  rowwise() %>%
  mutate(across(8:11, ~ str_remove(., fixed("p.")))) %>%
  unite(NAME, c(8:11), sep = "_", na.rm = TRUE)

# A tibble: 4 x 8
  id        x p.ABC p.DEF p.TAR p.REP p.FAR NAME           
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>          
1 A         2     1    NA     1    NA    NA ABC_TAR        
2 A         2    NA     1    NA     1    NA DEF_REP        
3 B         3     1    NA     1     1     1 ABC_TAR_REP_FAR
4 B         3     1    NA     1    NA     1 ABC_TAR_FAR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM