[英]Pivot table excel or R
My data looks like this on excel: 我的数据在excel上看起来像这样:
Genename ID1 ID2 ID3
Gene1 R H R
Gene1 R H R
Gene1 H R H
Gene2 H R H
Gene2 R R H
Gene2 H R R
Gene2 R R R
I would like to create a column with the total number of individuals with at least one H per gene. 我想创建一个列,其中每个基因至少有一个H的个体总数。 So, it should look like this; 因此,它应该看起来像这样;
Genename Het
Gene1 3
Gene2 2
I have hundreds of gene so I need an automated way to get these counts.Thanks in advance. 我有数百个基因,因此我需要一种自动化的方法来获得这些计数。
try this 尝试这个
library(data.table)
data <- data.table(data)
res <- data[,list("Genename"=Genename,"Het"=rowSums(data=="H")>0)]
res <- res[,list("Het"=sum(Het)),by=Genename]
> res
# Genename Het
#1: Gene1 3
#2: Gene2 3
We can group by 'Genename', get a logical index for any
values that are 'H' in each column, get the sum
within summarise
and then we use rowSums
to get the expected output. 我们可以按“基因名称”分组,获取每列中“ H”的any
值的逻辑索引,获取summarise
的sum
,然后使用rowSums
获得预期的输出。
library(dplyr)
df1 %>%
group_by(Genename) %>%
summarise_each(funs(sum(any(.=='H')))) %>%
transmute(Genename= Genename, Het = rowSums(.[-1L]))
Genename Het
# (chr) (dbl)
#1 Gene1 3
#2 Gene2 2
Or as @aosmith mentioned distinct
would be an option after converting the 'wide' to 'long' format with gather
. 或者,正如@aosmith提到的distinct
转换“宽”到“长”格式后,将是一种选择gather
。
library(tidyr)
gather(df1, Var1, Var2, -Genename) %>%
group_by(Genename, Var1) %>%
distinct(Var2) %>%
group_by(Genename) %>%
summarise(Het= sum(Var2=='H'))
If we need the count of IDs with no 'H' and at least one 'R' per 'Genename' 如果我们需要不带“ H”且每个“基因名称”至少有一个“ R”的ID数量
df1 %>%
group_by(Genename) %>%
summarise_each(funs(all(.!='H') & any(.=='R'))) %>%
transmute(Genename=Genename, Het= rowSums(.[-1L]))
# Genename Het
# (chr) (dbl)
#1 Gene1 0
#2 Gene2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.