Let say this is my example data:
gene_data=data.frame(gene=c("g1","g2","g3","g4"),
sample1=c(12,25,73,84),
sample2=c(54,65,89,97),
sample3=c(45,25,6,8),
sample4=c(23,23,45,6))
I want to get median expression of each gene, when I try this it does not work.
gene_data %>% group_by(gene) %>% summarise(medians=median(.))
Since it warns me about non-numeric column which is the first "gene" column...
However, this one works quite well:
gene_data %>% group_by(gene) %>% summarise(medians=median(sample1:sample4))
This one creates an output like this, which is I want: (THIS TABLE IS WRONG, be careful, so this is not I want)
# A tibble: 4 × 2
gene medians
<fctr> <dbl>
1 g1 17.5
2 g2 24.0
3 g3 59.0
4 g4 45.0
But I need a general solution, and I do now want summarise_each, which apply median to all genes and that is wrong.
Not necessarily I know the name of the samples, so I want to get median expressions without knowing the sample names at the beginning and at the end.
For example,
gene_data %>% group_by(gene) %>% summarise(medians=median([the numeric columns, or column that contain something]))
Perhaps it is too easy, but I could not find for Dplyr. Thanks for your help.
We can use do
library(dplyr)
gene_data %>%
group_by(gene) %>%
do(data.frame(medians = median(unlist(.[-1]))))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.