简体   繁体   中英

dplyr medians of rows based on grouping variable

Let say this is my example data:

gene_data=data.frame(gene=c("g1","g2","g3","g4"),
sample1=c(12,25,73,84),
sample2=c(54,65,89,97),
sample3=c(45,25,6,8),
sample4=c(23,23,45,6))

I want to get median expression of each gene, when I try this it does not work.

gene_data %>% group_by(gene) %>% summarise(medians=median(.))

Since it warns me about non-numeric column which is the first "gene" column...

However, this one works quite well:

gene_data %>% group_by(gene) %>% summarise(medians=median(sample1:sample4))

This one creates an output like this, which is I want: (THIS TABLE IS WRONG, be careful, so this is not I want)

# A tibble: 4 × 2
gene medians
<fctr>   <dbl>
1     g1    17.5
2     g2    24.0
3     g3    59.0
4     g4    45.0

But I need a general solution, and I do now want summarise_each, which apply median to all genes and that is wrong.

Not necessarily I know the name of the samples, so I want to get median expressions without knowing the sample names at the beginning and at the end.

For example,

gene_data %>% group_by(gene) %>% summarise(medians=median([the numeric columns, or column that contain something]))

Perhaps it is too easy, but I could not find for Dplyr. Thanks for your help.

We can use do

library(dplyr)
gene_data %>%
   group_by(gene) %>% 
   do(data.frame(medians = median(unlist(.[-1]))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM