简体   繁体   English

如何将列值转换为数据表中的新列?

[英]How do I make column values into new columns in data table?

I have a table that looks like this:我有一张看起来像这样的表:

> dt
                 variant_id           transcript_id is_NL counts nrows
  1: chr10_60842447_A_G_b38 chr10_60871326_60871443     0  32968   685
  2: chr10_60842447_A_G_b38 chr10_60871326_60871443     1   1440    20
  3: chr10_60842447_A_G_b38 chr10_60871326_60871443     2    337     1
  4: chr10_60846892_G_A_b38 chr10_60871326_60871443     0  33157   690
  5: chr10_60846892_G_A_b38 chr10_60871326_60871443     1   1251    15
 ---                                                                  
227:  chr5_96832353_G_T_b38  chr5_96727531_96729611     1  33504   572
228:  chr5_96832353_G_T_b38  chr5_96727531_96729611     2   3352    52
229:  chr5_96834213_T_G_b38  chr5_96727531_96729611     0 110144  2208
230:  chr5_96834213_T_G_b38  chr5_96727531_96729611     1  33252   564
231:  chr5_96834213_T_G_b38  chr5_96727531_96729611     2   3352    52

I want to take the values of is_NL and make them into separate columns (eg is_NL_0 , is_NL_1 , is_NL_2 ), and, for now, fill them with the values from counts and nrows semi-colon separated (eg 32968;685 ).我想取is_NL的值并将它们放入单独的列(例如is_NL_0is_NL_1is_NL_2 ),现在,用countsnrows分号分隔的值填充它们(例如32968;685 )。 I've been using tidyr 's pivot_wider to do this but, because I'm inexperienced with this package, I've been having a little trouble:我一直在使用tidyrpivot_wider来做到这一点,但是,因为我对这个包没有经验,所以我遇到了一些麻烦:

> dt %>% pivot_wider(-c(transcript_id, variant_id), names_from = "is_NL", values_from = paste0(dt$counts, ";", dt$nrows), names_prefix = "NL_") %>% as.data.table
Error: Unknown columns `32968;685`, `1440;20`, `337;1`, `33157;690`, `1251;15` and ... 
Run `rlang::last_error()` to see where the error occurred.

I'm going to keep working on this but would like to know how I could do this in a way that would make sense.我将继续致力于此,但想知道我如何以一种有意义的方式做到这一点。

Not familiar with tidyr but you could do:不熟悉tidyr但你可以这样做:

dt[, tmp := paste(counts, nrows, sep = ";")
   ][, dcast(.SD, transcript_id + variant_id ~ is_NL, value.var = "tmp")]

             transcript_id             variant_id           0         1       2
1: chr10_60871326_60871443 chr10_60842447_A_G_b38   32968;685   1440;20   337;1
2: chr10_60871326_60871443 chr10_60846892_G_A_b38   33157;690   1251;15    <NA>
3:  chr5_96727531_96729611  chr5_96832353_G_T_b38        <NA> 33504;572 3352;52
4:  chr5_96727531_96729611  chr5_96834213_T_G_b38 110144;2208 33252;564 3352;52

Data数据

library(data.table)
dt <- fread("           variant_id           transcript_id is_NL counts nrows
chr10_60842447_A_G_b38 chr10_60871326_60871443     0  32968   685
chr10_60842447_A_G_b38 chr10_60871326_60871443     1   1440    20
chr10_60842447_A_G_b38 chr10_60871326_60871443     2    337     1
chr10_60846892_G_A_b38 chr10_60871326_60871443     0  33157   690
chr10_60846892_G_A_b38 chr10_60871326_60871443     1   1251    15
chr5_96832353_G_T_b38  chr5_96727531_96729611     1  33504   572
chr5_96832353_G_T_b38  chr5_96727531_96729611     2   3352    52
chr5_96834213_T_G_b38  chr5_96727531_96729611     0 110144  2208
chr5_96834213_T_G_b38  chr5_96727531_96729611     1  33252   564
chr5_96834213_T_G_b38  chr5_96727531_96729611     2   3352    52")

This should work fine for you case.这应该适合你的情况。

library(tidyverse)

df_example <- tibble::tribble(~variant_id,~transcript_id, ~is_NL, ~counts, ~ nrows,
"chr10_60842447_A_G_b38", "chr10_60871326_60871443",     0,  32968,   685,
 "chr10_60842447_A_G_b38", "chr10_60871326_60871443",    1 ,  1440  ,  20,
 "chr10_60842447_A_G_b38" ,"chr10_60871326_60871443",     2,    337  ,   1,
 "chr10_60846892_G_A_b38" ,"chr10_60871326_60871443",     0 , 33157   ,690,
 "chr10_60846892_G_A_b38" ,"chr10_60871326_60871443",     1  , 1251    ,15)

df_example %>%
  mutate(counts = counts %>% as.character(),
         nrows = nrows %>% as.character()) %>% 
  unite("result",counts,nrows,sep = ";") %>% 
  pivot_wider(names_from = is_NL,values_from = result)


# A tibble: 2 x 5
  variant_id             transcript_id           `0`       `1`     `2`  
  <chr>                  <chr>                   <chr>     <chr>   <chr>
1 chr10_60842447_A_G_b38 chr10_60871326_60871443 32968;685 1440;20 337;1
2 chr10_60846892_G_A_b38 chr10_60871326_60871443 33157;690 1251;15 NA  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在数据表中添加一列,以显示其他多个列的值之和? - How do I add a column to my data table that shows the sum of multiple other columns' values? 如何使用常规字符串值序列在 data.table 中创建新列? - How do I create a new column in a data.table with a regular sequence of string values? 如何从列上的数据名称中创建新的数据框? - How do I make a new dataframe out of the data names on the columns? 在 R 中,如何对由两个字符列聚合的 data.table 列中的值求和,其中列名和行名等于字符串 output 的矩阵? - In R how do I sum values in a data.table column aggregated by two character columns, with matrix with colnames and rownames equal to strings output? 如何创建一个新列,其值取决于其他列中的值? - How do I create a new column with values that depend on the values in other columns? 如何根据其他列的值在 data.table 中创建新列 - How to create a new column in data.table based on values of other columns 如何将列中的每个数据分别转换为新表中的新列 - How to convert each data in a column into a new columns respectively in a new table 对于 R 中的缺失值,如何获取一个列的子集并将其放入一个包含 0 而不是 NA 的新列中? - How do I take a subset of a column and make it into a new column with 0s instead of NAs for missing values in R? 如何有条件地将多列中的值提取到新列中? - How do I pull the values from multiple columns, conditionally, into a new column? 如何根据其他列中的值是否等于特定字符串来创建新列? - How do I create a new column based on whether the values in other columns are equal to specific strings?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM