简体   繁体   English

如何根据其他 3 列 [R] 中的匹配值计算我在 1 列中求和的行数?

[英]How do I count the number of rows I have summed values in 1 column based on matching values in 3 other columns [R]?

I have a table which looks like this:我有一张看起来像这样的表:

> dt
                      variant_id           transcript_id
       1:  chr1_37492738_T_C_b38  chr1_37557076_37557602
       2:  chr1_37492738_T_C_b38  chr1_37557076_37557602
       3:  chr1_37492738_T_C_b38  chr1_37557076_37557602
       4:  chr1_37492738_T_C_b38  chr1_37557076_37557602
       5:  chr1_37492738_T_C_b38  chr1_37557076_37557602
      ---
13527497: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527498: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527499: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527500: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527501: chr22_49950090_T_G_b38 chr22_49925558_49927254
                         tissue_id counts individual is_NL
       1: GTEX-11DXX-1426-SM-5GIDU     46 GTEX-11DXX     0
       2: GTEX-11EM3-1726-SM-5N9D1     54 GTEX-11EM3     0
       3: GTEX-11EMC-1726-SM-5H11P     61 GTEX-11EMC     0
       4: GTEX-11GSP-0226-SM-5A5KV     44 GTEX-11GSP     0
       5: GTEX-11I78-1926-SM-59878     27 GTEX-11I78     0
      ---
13527497:  GTEX-ZVT2-0326-SM-5E44G    110  GTEX-ZVT2     1
13527498:  GTEX-ZVT3-2626-SM-5GU5L     54  GTEX-ZVT3     1
13527499:  GTEX-ZYFG-1726-SM-5GZZB     66  GTEX-ZYFG     1
13527500:  GTEX-ZYY3-2726-SM-5EGH4     96  GTEX-ZYY3     1
13527501:  GTEX-ZZPU-2126-SM-5EGIU     75  GTEX-ZZPU     0

I was successfully able to sum up values in dt$counts by using the line: dt2 <- as.data.table(ddply(dt, c("variant_id", "transcript_id", "is_NL"), numcolwise(sum)))通过使用以下行,我成功地总结了dt$counts的值: dt2 <- as.data.table(ddply(dt, c("variant_id", "transcript_id", "is_NL"), numcolwise(sum)))

making the result look like this:使结果看起来像这样:

> dt2
                     variant_id             transcript_id is_NL counts
     1: chr10_125381862_C_T_b38 chr10_124989699_124992694     0  30610
     2: chr10_125381862_C_T_b38 chr10_124989699_124992694     1   1932
     3: chr10_125381862_C_T_b38 chr10_124992813_124993201     0  28215
     4: chr10_125381862_C_T_b38 chr10_124992813_124993201     1   1706
     5: chr10_125381862_C_T_b38 chr10_124993330_124993854     0  17974
    ---
232637:   chr9_92815645_A_C_b38    chr9_92517876_92522574     1   2009
232638:   chr9_92815645_A_C_b38    chr9_92522894_92535932     0  10026
232639:   chr9_92815645_A_C_b38    chr9_92522894_92535932     1   1454
232640:   chr9_92815645_A_C_b38    chr9_92535983_92536600     0   2495
232641:   chr9_92815645_A_C_b38    chr9_92535983_92536600     1    341

However, I would also like a column to the right that also indicates the number of rows over which the values of dt2$counts were summed ie number of samples per group of matching c("variant_id", "transcript_id", "is_NL") .但是,我还想要右侧的一列,该列还指示对dt2$counts的值求和的行dt2$counts ,即每组匹配c("variant_id", "transcript_id", "is_NL")的样本数. How would I go about doing this?我该怎么做呢? I could figure it out if it were Python or Java but unfortunately with R, I don't know where I would even start.我可以弄清楚它是 Python 还是 Java,但不幸的是,对于 R,我什至不知道从哪里开始。

Using dplyr:使用 dplyr:

library(dplyr)
dt2<- dt %>%
      group_by(variant_id, transcript_id, is_nl) %>%
      summarise(counts=sum(counts), nrows=n())

您也可以只使用不需要任何库的聚合函数:

dt2 <- aggregate(counts ~ variant_id+transcript_id+is_NL, dt, function(x) c(sum = sum(x), n = length(x)))

data.table ,它是:

setDT(dt)[, .(counts=sum(counts), .N), .(variant_id, transcript_id, is_NL)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于匹配R中的其他列的行值来填充列的值 - How do I fill in values for columns based on matching few other column's row values in R 如何根据满足特定条件的所有行过滤具有匹配列值的多行? [R] - How do I filter multiple rows with matching column values based on all rows meeting a certain condition? [R] 如何根据R中的匹配ID添加列值? - How do I add column values based on matching IDs in R? R - 基于其他列中其他行的值的新列 - R - New column based on values from other rows in other columns R:如何根据其他行的值创建行? - R: How can I create rows based on the values of other rows? 当这些值显示在 R 的其他列中时,如何用值填充列? - How do I fill a column with values when those values are displayed in other columns in R? 如何根据其他列的字符值(不包括NA和值)创建第三列? - How do I create a third column based on Character Values of other columns, excluding NA and values? 如何根据其他列中的值将一列中的特定值向上移动一行? - How do I move specific values in a column up one row based on values in other columns? 如何根据其他列中的值计算列中唯一值的数量 - How to count number of unique values in column based on values in other column 如何根据另一个列标准计算列的不同值的数量? - How do I count the number of distinct values of a column based on another column criterion?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM