[英]R function for grouping rows based on patterns across columns?
I would like to group rows of a dataframe based on the pattern of each row across columns.我想根据每列跨列的模式对数据框的行进行分组。 Here is a very simple example.
这是一个非常简单的例子。
df <- data.frame("gene" = 1:5,
"stg 1" = c("up", "up", NA, NA, NA),
"stg 2" = c("up", "up", NA, NA, NA),
"stg 3" = c("up", "up", NA, NA, NA),
"stg 4" = c("down", "down", "up", "up", NA))
> df
gene stg.1 stg.2 stg.3 stg.4
1 1 up up up down
2 2 up up up down
3 3 <NA> <NA> <NA> up
4 4 <NA> <NA> <NA> up
5 5 <NA> <NA> <NA> <NA>
In this case, gene 1 and 2 would be grouped, and genes 3 and 4 would be grouped.在这种情况下,基因 1 和 2 将被分组,基因 3 和 4 将被分组。 I would like the names of the genes in each pattern group, and what the pattern is for that group.
我想要每个模式组中基因的名称,以及该组的模式是什么。 I hope that is clear.
我希望这很清楚。 Thanks in advance!
提前致谢!
Try this approach.试试这个方法。 Create a variable to collect the values across rows using
c_across()
and toString()
.创建一个变量以使用
c_across()
和toString()
c_across()
收集值。 After that, format as factor and assign the suffix Group.
之后,格式化为因子并分配后缀
Group.
. . Here the code using
tidyverse
functions:这里使用
tidyverse
函数的代码:
library(tidyverse)
#Code
dfnew <- df %>% group_by(gene) %>%
mutate(Var=toString(c_across(stg.1:stg.4))) %>%
ungroup() %>%
mutate(Var=paste0('Group.',as.numeric(factor(Var,levels = unique(Var),ordered = T))))
Output:输出:
# A tibble: 5 x 6
gene stg.1 stg.2 stg.3 stg.4 Var
<int> <fct> <fct> <fct> <fct> <chr>
1 1 up up up down Group.1
2 2 up up up down Group.1
3 3 NA NA NA up Group.2
4 4 NA NA NA up Group.2
5 5 NA NA NA NA Group.3
If you only need a pattern, try this:如果你只需要一个模式,试试这个:
#Code 2
dfnew <- df %>% group_by(gene) %>%
mutate(Var=toString(c_across(stg.1:stg.4)))
Output:输出:
# A tibble: 5 x 6
# Groups: gene [5]
gene stg.1 stg.2 stg.3 stg.4 Var
<int> <fct> <fct> <fct> <fct> <chr>
1 1 up up up down up, up, up, down
2 2 up up up down up, up, up, down
3 3 NA NA NA up NA, NA, NA, up
4 4 NA NA NA up NA, NA, NA, up
5 5 NA NA NA NA NA, NA, NA, NA
We can do this in a vectorized way with unite
我们可以通过
unite
以矢量化的方式做到这一点
library(dplyr)
library(tidyr)
df %>%
unite(grp, starts_with('stg'), na.rm = TRUE, remove = FALSE) %>%
mutate(grp = match(grp, unique(grp)))
# gene grp stg.1 stg.2 stg.3 stg.4
#1 1 1 up up up down
#2 2 1 up up up down
#3 3 2 <NA> <NA> <NA> up
#4 4 2 <NA> <NA> <NA> up
#5 5 3 <NA> <NA> <NA> <NA>
Though not specifically asked, data.table
solution goes as under虽然没有特别要求,但
data.table
解决方案如下
library(data.table)
setDT(df)
df[,group:= paste0(stg.1,stg.2,stg.3,stg.4),by= gene][,group:= match(group, unique(group))]
> df
gene stg.1 stg.2 stg.3 stg.4 group
1: 1 up up up down 1
2: 2 up up up down 1
3: 3 <NA> <NA> <NA> up 2
4: 4 <NA> <NA> <NA> up 2
5: 5 <NA> <NA> <NA> <NA> 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.