简体   繁体   English

聚合数据框列

[英]aggregating columns of data frame

I have a data.frame as follows: 我有一个data.frame如下:

>data
    ID     Orginal   Modified
    Sam_1    M         K
    Sam_1    K         M
    Sam_1    I         J
    Sam_1    M         K
    Sam_1    K         M
    Sam_2    K         M
    Sam_2    M         K
    Sam_3    J         P
    Sam_4    K         M
    Sam_4    M         K
    Sam_4    P         J 

I would like to count the for every sample number times M in column "Original" is converted to K in column "Modified" and "K" ins the column "Original" to "M" in columns Modified and report it in tab delim text file as follows: 我想计算每个样本数量的时间M列中的“原始”在“修改”列中转换为K而“K”在“修改”列中将“原始”列转换为“M”并在制表符分隔文本中报告文件如下:

>newdata
    ID     M_to_K_counts  K_to_M_counts 
    Sam_1     2                2 
    Sam_2     1                1
    Sam_3     0                0
    Sam_4     1                1

I tried following code but it failed: 我尝试了以下代码,但失败了:

counts=function()
{
for(i in 1:dim(rnaseqmut)[1])
{
  mk_counts=0
  km_counts=0
  if(data$Original[i]=='M' & data$Modified[i]== 'K')
    {
       mk_counts=mk_counts+1
    }
  if(data$Original[i]=='K' & data$Modified[i]== 'M')
    {
       km_counts=km_counts+1
    }
}
print(mk_counts)
print(km_counts)
}

How can I achieve my desired format. 我怎样才能达到我想要的格式。

One option would be using data.table . 一种选择是使用data.table Convert the 'data.frame' to 'data.table' ( setDT(data) ). 将'data.frame'转换为'data.table'( setDT(data) )。 Grouped by the 'ID' column, we get the sum of elements that are 'M' for the 'Orginal' and 'K' for 'Modified' ('MtoKcount'), similarly the 'KtoMcount' is got by doing the reverse. 通过“ID”列分组,我们得到'原始'的'M'和'Modified'('MtoKcount')的'K'元素的sum ,类似地通过反向得到'KtoMcount'。

library(data.table)
setDT(data)[, list(MtoKcount=sum(Orginal=='M' & Modified=='K'),
               KtoMcount = sum(Orginal=='K' & Modified=='M')), by =  ID]
#       ID MtoKcount KtoMcount
#1: Sam_1         2         2
#2: Sam_2         1         1
#3: Sam_3         0         0
#4: Sam_4         1         1

Another option is table from base R . 另一种选择是来自base R table We paste the columns other than the 'ID' column ( do.call(paste0, data[-1]) ) and get the frequency count using table . 我们paste “ID”列以外的列( do.call(paste0, data[-1]) )并使用table获取频率计数。 Then, we subset the table output ('tbl') that have only 'KM' or 'MK' as its column names 然后,我们将只有'KM'或'MK'作为列名的表输出('tbl')进行子集化

 tbl <- table(data$ID,do.call(paste0, data[-1]))[,c('KM', 'MK')]
 tbl
 #      KM MK
 #Sam_1  2  2
 #Sam_2  1  1
 #Sam_3  0  0
 #Sam_4  1  1

As @user295691 mentioned in the comments, we can change the column names while paste ing. 正如评论中提到的@ user295691,我们可以在paste更改列名。

  tbl <- with(data, table(ID, paste0(Orginal, "_to_", Modified,"_counts"))) 
  tbl[,c('K_to_M_counts', 'M_to_K_counts')]

data 数据

data <- structure(list(ID = c("Sam_1", "Sam_1", "Sam_1", "Sam_1", 
"Sam_1", 
"Sam_2", "Sam_2", "Sam_3", "Sam_4", "Sam_4", "Sam_4"), Orginal = c("M", 
"K", "I", "M", "K", "K", "M", "J", "K", "M", "P"), Modified = c("K", 
"M", "J", "K", "M", "M", "K", "P", "M", "K", "J")), .Names = c("ID", 
"Orginal", "Modified"), class = "data.frame", row.names = c(NA, 
-11L))

Base R using xtabs . 基础R使用xtabs Desired shape/subsetting requires transposing and fiddling with container type classes. 期望的形状/子集需要转置和摆弄容器类型类。

d<-as.matrix(ftable(xtabs(Count~Orginal+Modified+ID,transform(data,Count=1))))
as.data.frame(t(d))[,c("M_K","K_M")]
M_K K_M
Sam_1   2   2
Sam_2   1   1
Sam_3   0   0
Sam_4   1   1

Using dplyr 使用dplyr

x <- data.frame(ID = c(rep("Sam_1", 5), rep("Sam_2", 2), "Sam_3", rep("Sam_4", 3)), 
 Orginal = c("M", "K", "I", "M", "K", "K", "M", "J", "K", "M", "P"), 
 Modified = c("K", "M", "J", "K", "M", "M", "K", "P", "M", "K", "J"))

x %>%
   group_by(ID) %>%
   summarise(M_to_K_counts = length((Orginal == "M")[Modified == "K"]), 
             K_to_M_counts = length((Orginal == "K")[Modified == "M"]))

# Source: local data frame [4 x 3]

#      ID M_to_K_counts K_to_M_counts
# 1 Sam_1             2             2
# 2 Sam_2             1             1
# 3 Sam_3             0             0
# 4 Sam_4             1             1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM