用于根据跨列模式对行进行分组的 R 函数？

Question

I would like to group rows of a dataframe based on the pattern of each row across columns.我想根据每列跨列的模式对数据框的行进行分组。 Here is a very simple example.这是一个非常简单的例子。

 df <- data.frame("gene" = 1:5, 
                 "stg 1" = c("up", "up", NA, NA, NA),
                 "stg 2" = c("up", "up", NA, NA, NA),
                 "stg 3" = c("up", "up", NA, NA, NA),
                 "stg 4" = c("down", "down", "up", "up", NA))

> df
  gene stg.1 stg.2 stg.3 stg.4
1    1    up    up    up  down
2    2    up    up    up  down
3    3  <NA>  <NA>  <NA>    up
4    4  <NA>  <NA>  <NA>    up
5    5  <NA>  <NA>  <NA>  <NA>

In this case, gene 1 and 2 would be grouped, and genes 3 and 4 would be grouped.在这种情况下，基因 1 和 2 将被分组，基因 3 和 4 将被分组。 I would like the names of the genes in each pattern group, and what the pattern is for that group.我想要每个模式组中基因的名称，以及该组的模式是什么。 I hope that is clear.我希望这很清楚。 Thanks in advance!提前致谢！

Answer 1

Try this approach.试试这个方法。 Create a variable to collect the values across rows using c_across() and toString() .创建一个变量以使用c_across()和toString() c_across()收集值。 After that, format as factor and assign the suffix Group.之后，格式化为因子并分配后缀Group. . . Here the code using tidyverse functions:这里使用tidyverse函数的代码：

library(tidyverse)
#Code
dfnew <- df %>% group_by(gene) %>% 
  mutate(Var=toString(c_across(stg.1:stg.4))) %>%
  ungroup() %>%
  mutate(Var=paste0('Group.',as.numeric(factor(Var,levels = unique(Var),ordered = T))))

Output:输出：

# A tibble: 5 x 6
   gene stg.1 stg.2 stg.3 stg.4 Var    
  <int> <fct> <fct> <fct> <fct> <chr>  
1     1 up    up    up    down  Group.1
2     2 up    up    up    down  Group.1
3     3 NA    NA    NA    up    Group.2
4     4 NA    NA    NA    up    Group.2
5     5 NA    NA    NA    NA    Group.3

If you only need a pattern, try this:如果你只需要一个模式，试试这个：

#Code 2
dfnew <- df %>% group_by(gene) %>% 
  mutate(Var=toString(c_across(stg.1:stg.4)))

Output:输出：

# A tibble: 5 x 6
# Groups:   gene [5]
   gene stg.1 stg.2 stg.3 stg.4 Var             
  <int> <fct> <fct> <fct> <fct> <chr>           
1     1 up    up    up    down  up, up, up, down
2     2 up    up    up    down  up, up, up, down
3     3 NA    NA    NA    up    NA, NA, NA, up  
4     4 NA    NA    NA    up    NA, NA, NA, up  
5     5 NA    NA    NA    NA    NA, NA, NA, NA

Answer 2

We can do this in a vectorized way with unite我们可以通过unite以矢量化的方式做到这一点

library(dplyr)
library(tidyr)
df %>% 
     unite(grp, starts_with('stg'), na.rm = TRUE, remove = FALSE) %>% 
     mutate(grp = match(grp, unique(grp)))
#  gene grp stg.1 stg.2 stg.3 stg.4
#1    1   1    up    up    up  down
#2    2   1    up    up    up  down
#3    3   2  <NA>  <NA>  <NA>    up
#4    4   2  <NA>  <NA>  <NA>    up
#5    5   3  <NA>  <NA>  <NA>  <NA>

Answer 3

Though not specifically asked, data.table solution goes as under虽然没有特别要求，但data.table解决方案如下


library(data.table)
setDT(df)
df[,group:= paste0(stg.1,stg.2,stg.3,stg.4),by= gene][,group:= match(group, unique(group))]

> df
   gene stg.1 stg.2 stg.3 stg.4 group
1:    1    up    up    up  down     1
2:    2    up    up    up  down     1
3:    3  <NA>  <NA>  <NA>    up     2
4:    4  <NA>  <NA>  <NA>    up     2
5:    5  <NA>  <NA>  <NA>  <NA>     3

用于根据跨列模式对行进行分组的 R 函数？

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-10-13 19:08:59

解决方案2
0 2020-10-13 23:51:10

解决方案3
0 2020-10-14 05:33:48

用于根据跨列模式对行进行分组的 R 函数？

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-10-13 19:08:59

解决方案2 0 2020-10-13 23:51:10

解决方案3 0 2020-10-14 05:33:48

解决方案1
1 已采纳 2020-10-13 19:08:59

解决方案2
0 2020-10-13 23:51:10

解决方案3
0 2020-10-14 05:33:48