![](/img/trans.png)
[英]How do I fill in values for columns based on matching few other column's row values in R
[英]How do I count the number of rows I have summed values in 1 column based on matching values in 3 other columns [R]?
我有一張看起來像這樣的表:
> dt
variant_id transcript_id
1: chr1_37492738_T_C_b38 chr1_37557076_37557602
2: chr1_37492738_T_C_b38 chr1_37557076_37557602
3: chr1_37492738_T_C_b38 chr1_37557076_37557602
4: chr1_37492738_T_C_b38 chr1_37557076_37557602
5: chr1_37492738_T_C_b38 chr1_37557076_37557602
---
13527497: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527498: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527499: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527500: chr22_49950090_T_G_b38 chr22_49925558_49927254
13527501: chr22_49950090_T_G_b38 chr22_49925558_49927254
tissue_id counts individual is_NL
1: GTEX-11DXX-1426-SM-5GIDU 46 GTEX-11DXX 0
2: GTEX-11EM3-1726-SM-5N9D1 54 GTEX-11EM3 0
3: GTEX-11EMC-1726-SM-5H11P 61 GTEX-11EMC 0
4: GTEX-11GSP-0226-SM-5A5KV 44 GTEX-11GSP 0
5: GTEX-11I78-1926-SM-59878 27 GTEX-11I78 0
---
13527497: GTEX-ZVT2-0326-SM-5E44G 110 GTEX-ZVT2 1
13527498: GTEX-ZVT3-2626-SM-5GU5L 54 GTEX-ZVT3 1
13527499: GTEX-ZYFG-1726-SM-5GZZB 66 GTEX-ZYFG 1
13527500: GTEX-ZYY3-2726-SM-5EGH4 96 GTEX-ZYY3 1
13527501: GTEX-ZZPU-2126-SM-5EGIU 75 GTEX-ZZPU 0
通過使用以下行,我成功地總結了dt$counts
的值: dt2 <- as.data.table(ddply(dt, c("variant_id", "transcript_id", "is_NL"), numcolwise(sum)))
使結果看起來像這樣:
> dt2
variant_id transcript_id is_NL counts
1: chr10_125381862_C_T_b38 chr10_124989699_124992694 0 30610
2: chr10_125381862_C_T_b38 chr10_124989699_124992694 1 1932
3: chr10_125381862_C_T_b38 chr10_124992813_124993201 0 28215
4: chr10_125381862_C_T_b38 chr10_124992813_124993201 1 1706
5: chr10_125381862_C_T_b38 chr10_124993330_124993854 0 17974
---
232637: chr9_92815645_A_C_b38 chr9_92517876_92522574 1 2009
232638: chr9_92815645_A_C_b38 chr9_92522894_92535932 0 10026
232639: chr9_92815645_A_C_b38 chr9_92522894_92535932 1 1454
232640: chr9_92815645_A_C_b38 chr9_92535983_92536600 0 2495
232641: chr9_92815645_A_C_b38 chr9_92535983_92536600 1 341
但是,我還想要右側的一列,該列還指示對dt2$counts
的值求和的行dt2$counts
,即每組匹配c("variant_id", "transcript_id", "is_NL")
的樣本數. 我該怎么做呢? 我可以弄清楚它是 Python 還是 Java,但不幸的是,對於 R,我什至不知道從哪里開始。
使用 dplyr:
library(dplyr)
dt2<- dt %>%
group_by(variant_id, transcript_id, is_nl) %>%
summarise(counts=sum(counts), nrows=n())
您也可以只使用不需要任何庫的聚合函數:
dt2 <- aggregate(counts ~ variant_id+transcript_id+is_NL, dt, function(x) c(sum = sum(x), n = length(x)))
在data.table
,它是:
setDT(dt)[, .(counts=sum(counts), .N), .(variant_id, transcript_id, is_NL)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.