数据透视表Excel或R

Question

My data looks like this on excel: 我的数据在excel上看起来像这样：

Genename ID1 ID2 ID3
Gene1   R   H   R
Gene1   R   H   R
Gene1   H   R   H
Gene2   H   R   H
Gene2   R   R   H
Gene2   H   R   R
Gene2   R   R   R

I would like to create a column with the total number of individuals with at least one H per gene. 我想创建一个列，其中每个基因至少有一个H的个体总数。 So, it should look like this; 因此，它应该看起来像这样；

Genename Het
Gene1 3
Gene2 2

I have hundreds of gene so I need an automated way to get these counts.Thanks in advance. 我有数百个基因，因此我需要一种自动化的方法来获得这些计数。

Answer 1

try this 尝试这个

library(data.table)
data <- data.table(data)
res <- data[,list("Genename"=Genename,"Het"=rowSums(data=="H")>0)]
res <- res[,list("Het"=sum(Het)),by=Genename]

> res
#   Genename Het
#1:    Gene1   3
#2:    Gene2   3

Answer 2

We can group by 'Genename', get a logical index for any values that are 'H' in each column, get the sum within summarise and then we use rowSums to get the expected output. 我们可以按“基因名称”分组，获取每列中“ H”的any值的逻辑索引，获取summarise的sum ，然后使用rowSums获得预期的输出。

library(dplyr)
df1 %>% 
   group_by(Genename) %>% 
   summarise_each(funs(sum(any(.=='H')))) %>% 
   transmute(Genename= Genename, Het = rowSums(.[-1L]))
   Genename   Het
#    (chr) (dbl)
#1    Gene1     3
#2    Gene2     2

Or as @aosmith mentioned distinct would be an option after converting the 'wide' to 'long' format with gather . 或者，正如@aosmith提到的distinct转换“宽”到“长”格式后，将是一种选择gather 。

library(tidyr)
gather(df1, Var1, Var2, -Genename) %>% 
            group_by(Genename, Var1) %>%
            distinct(Var2) %>%
            group_by(Genename) %>%
            summarise(Het= sum(Var2=='H'))

Update 更新资料

If we need the count of IDs with no 'H' and at least one 'R' per 'Genename' 如果我们需要不带“ H”且每个“基因名称”至少有一个“ R”的ID数量

df1 %>% 
     group_by(Genename) %>%
     summarise_each(funs(all(.!='H') & any(.=='R'))) %>% 
     transmute(Genename=Genename, Het= rowSums(.[-1L]))
#   Genename   Het
#     (chr) (dbl)
#1    Gene1     0
#2    Gene2     1

数据透视表Excel或R

问题描述

2 个解决方案

解决方案1
2 2015-11-19 19:25:27

解决方案2
1 已采纳 2015-11-19 19:08:10

Update 更新资料

数据透视表Excel或R

问题描述

2 个解决方案

解决方案1 2 2015-11-19 19:25:27

解决方案2 1 已采纳 2015-11-19 19:08:10

Update 更新资料

解决方案1
2 2015-11-19 19:25:27

解决方案2
1 已采纳 2015-11-19 19:08:10