![](/img/trans.png)
[英]How do I add a column to my data table that shows the sum of multiple other columns' values?
[英]How do I make column values into new columns in data table?
我有一張看起來像這樣的表:
> dt
variant_id transcript_id is_NL counts nrows
1: chr10_60842447_A_G_b38 chr10_60871326_60871443 0 32968 685
2: chr10_60842447_A_G_b38 chr10_60871326_60871443 1 1440 20
3: chr10_60842447_A_G_b38 chr10_60871326_60871443 2 337 1
4: chr10_60846892_G_A_b38 chr10_60871326_60871443 0 33157 690
5: chr10_60846892_G_A_b38 chr10_60871326_60871443 1 1251 15
---
227: chr5_96832353_G_T_b38 chr5_96727531_96729611 1 33504 572
228: chr5_96832353_G_T_b38 chr5_96727531_96729611 2 3352 52
229: chr5_96834213_T_G_b38 chr5_96727531_96729611 0 110144 2208
230: chr5_96834213_T_G_b38 chr5_96727531_96729611 1 33252 564
231: chr5_96834213_T_G_b38 chr5_96727531_96729611 2 3352 52
我想取is_NL
的值並將它們放入單獨的列(例如is_NL_0
、 is_NL_1
、 is_NL_2
),現在,用counts
和nrows
分號分隔的值填充它們(例如32968;685
)。 我一直在使用tidyr
的pivot_wider
來做到這一點,但是,因為我對這個包沒有經驗,所以我遇到了一些麻煩:
> dt %>% pivot_wider(-c(transcript_id, variant_id), names_from = "is_NL", values_from = paste0(dt$counts, ";", dt$nrows), names_prefix = "NL_") %>% as.data.table
Error: Unknown columns `32968;685`, `1440;20`, `337;1`, `33157;690`, `1251;15` and ...
Run `rlang::last_error()` to see where the error occurred.
我將繼續致力於此,但想知道我如何以一種有意義的方式做到這一點。
不熟悉tidyr
但你可以這樣做:
dt[, tmp := paste(counts, nrows, sep = ";")
][, dcast(.SD, transcript_id + variant_id ~ is_NL, value.var = "tmp")]
transcript_id variant_id 0 1 2
1: chr10_60871326_60871443 chr10_60842447_A_G_b38 32968;685 1440;20 337;1
2: chr10_60871326_60871443 chr10_60846892_G_A_b38 33157;690 1251;15 <NA>
3: chr5_96727531_96729611 chr5_96832353_G_T_b38 <NA> 33504;572 3352;52
4: chr5_96727531_96729611 chr5_96834213_T_G_b38 110144;2208 33252;564 3352;52
數據
library(data.table)
dt <- fread(" variant_id transcript_id is_NL counts nrows
chr10_60842447_A_G_b38 chr10_60871326_60871443 0 32968 685
chr10_60842447_A_G_b38 chr10_60871326_60871443 1 1440 20
chr10_60842447_A_G_b38 chr10_60871326_60871443 2 337 1
chr10_60846892_G_A_b38 chr10_60871326_60871443 0 33157 690
chr10_60846892_G_A_b38 chr10_60871326_60871443 1 1251 15
chr5_96832353_G_T_b38 chr5_96727531_96729611 1 33504 572
chr5_96832353_G_T_b38 chr5_96727531_96729611 2 3352 52
chr5_96834213_T_G_b38 chr5_96727531_96729611 0 110144 2208
chr5_96834213_T_G_b38 chr5_96727531_96729611 1 33252 564
chr5_96834213_T_G_b38 chr5_96727531_96729611 2 3352 52")
這應該適合你的情況。
library(tidyverse)
df_example <- tibble::tribble(~variant_id,~transcript_id, ~is_NL, ~counts, ~ nrows,
"chr10_60842447_A_G_b38", "chr10_60871326_60871443", 0, 32968, 685,
"chr10_60842447_A_G_b38", "chr10_60871326_60871443", 1 , 1440 , 20,
"chr10_60842447_A_G_b38" ,"chr10_60871326_60871443", 2, 337 , 1,
"chr10_60846892_G_A_b38" ,"chr10_60871326_60871443", 0 , 33157 ,690,
"chr10_60846892_G_A_b38" ,"chr10_60871326_60871443", 1 , 1251 ,15)
df_example %>%
mutate(counts = counts %>% as.character(),
nrows = nrows %>% as.character()) %>%
unite("result",counts,nrows,sep = ";") %>%
pivot_wider(names_from = is_NL,values_from = result)
# A tibble: 2 x 5
variant_id transcript_id `0` `1` `2`
<chr> <chr> <chr> <chr> <chr>
1 chr10_60842447_A_G_b38 chr10_60871326_60871443 32968;685 1440;20 337;1
2 chr10_60846892_G_A_b38 chr10_60871326_60871443 33157;690 1251;15 NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.