简体   繁体   English

数据透视表Excel或R

[英]Pivot table excel or R

My data looks like this on excel: 我的数据在excel上看起来像这样:

Genename ID1 ID2 ID3
Gene1   R   H   R
Gene1   R   H   R
Gene1   H   R   H
Gene2   H   R   H
Gene2   R   R   H
Gene2   H   R   R
Gene2   R   R   R

I would like to create a column with the total number of individuals with at least one H per gene. 我想创建一个列,其中每个基因至少有一个H的个体总数。 So, it should look like this; 因此,它应该看起来像这样;

Genename Het
Gene1 3
Gene2 2

I have hundreds of gene so I need an automated way to get these counts.Thanks in advance. 我有数百个基因,因此我需要一种自动化的方法来获得这些计数。

try this 尝试这个

library(data.table)
data <- data.table(data)
res <- data[,list("Genename"=Genename,"Het"=rowSums(data=="H")>0)]
res <- res[,list("Het"=sum(Het)),by=Genename]

> res
#   Genename Het
#1:    Gene1   3
#2:    Gene2   3

We can group by 'Genename', get a logical index for any values that are 'H' in each column, get the sum within summarise and then we use rowSums to get the expected output. 我们可以按“基因名称”分组,获取每列中“ H”的any值的逻辑索引,获取summarisesum ,然后使用rowSums获得预期的输出。

library(dplyr)
df1 %>% 
   group_by(Genename) %>% 
   summarise_each(funs(sum(any(.=='H')))) %>% 
   transmute(Genename= Genename, Het = rowSums(.[-1L]))
   Genename   Het
#    (chr) (dbl)
#1    Gene1     3
#2    Gene2     2

Or as @aosmith mentioned distinct would be an option after converting the 'wide' to 'long' format with gather . 或者,正如@aosmith提到的distinct转换“宽”到“长”格式后,将是一种选择gather

library(tidyr)
gather(df1, Var1, Var2, -Genename) %>% 
            group_by(Genename, Var1) %>%
            distinct(Var2) %>%
            group_by(Genename) %>%
            summarise(Het= sum(Var2=='H'))

Update 更新资料

If we need the count of IDs with no 'H' and at least one 'R' per 'Genename' 如果我们需要不带“ H”且每个“基因名称”至少有一个“ R”的ID数量

df1 %>% 
     group_by(Genename) %>%
     summarise_each(funs(all(.!='H') & any(.=='R'))) %>% 
     transmute(Genename=Genename, Het= rowSums(.[-1L]))
#   Genename   Het
#     (chr) (dbl)
#1    Gene1     0
#2    Gene2     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM