简体   繁体   English

如何合并两列并在 r 中保留第三列?

[英]How to coalesce two columns and retain a 3rd column in r?

I am trying to combine 2 ID columns into one, but keep each individual ID's score in another column.我正在尝试将 2 个 ID 列合并为一个,但将每个 ID 的分数保留在另一列中。 For example my data looks like this:例如,我的数据如下所示:

VARIANT_ID1          VARIANT_ID2        score
01_1123425_A_G_1    01_1254436_A_G_1    0.1
02_21234356_A_G_1   02_2254436_A_G_1    0.2
03_31234356_A_G_1   03_3255436_A_G_1    0.3
10_10312345_A_G_1   10_10344745_A_G_1   0.4

I am trying to output this:我正在尝试 output 这个:

VARIANT_ID1and2     score
01_1123425_A_G_1      0.1
02_21234356_A_G_1     0.2
03_31234356_A_G_1     0.3
10_10312345_A_G_1     0.4
01_1254436_A_G_1      0.1   #VARIANT_ID2 appended below VARIANT_ID1 here including their scores
02_2254436_A_G_1      0.2
03_3255436_A_G_1      0.3
10_10344745_A_G_1     0.4

I've been trying to use coalesce() from dplyr but haven't been able to find information on how to get the 3rd column included, I have a biology backgronud so not sure of any other functions which can account for this, any help to possible functions would be appreciated.我一直在尝试使用 dplyr 中的coalesce()但无法找到有关如何包含第三列的信息,我有一个生物学背景,所以不确定任何其他可以解释这一点的功能,任何帮助可能的功能将不胜感激。

Input data:输入数据:

structure(list(VARIANT_ID1 = c("01_1123425_A_G_1", "02_21234356_A_G_1", 
"03_31234356_A_G_1", "10_10312345_A_G_1", "11_1456768_A_G_1"), 
    VARIANT_ID2 = c("01_1254436_A_G_1", "02_2254436_A_G_1", "03_3255436_A_G_1", 
    "10_10344745_A_G_1", "11_11256437_A_G_1"), score = c(0.1, 
    0.2, 0.3, 0.4, 0.5)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

Using pivot_longer() from tidyr (part of tidyverse ):使用 tidyr 中的tidyverse pivot_longer()tidyr的一部分):

library(tidyverse)

df <-
  structure(
    list(
      VARIANT_ID1 = c(
        "01_1123425_A_G_1",
        "02_21234356_A_G_1",
        "03_31234356_A_G_1",
        "10_10312345_A_G_1",
        "11_1456768_A_G_1"
      ),
      VARIANT_ID2 = c(
        "01_1254436_A_G_1",
        "02_2254436_A_G_1",
        "03_3255436_A_G_1",
        "10_10344745_A_G_1",
        "11_11256437_A_G_1"
      ),
      score = c(0.1,
                0.2, 0.3, 0.4, 0.5)
    ),
    row.names = c(NA, -5L),
    class = c("data.table",
              "data.frame")
  )

df %>% 
  pivot_longer(starts_with('VARIANT_ID'), names_to = 'Variant.ID', names_prefix = 'VARIANT_ID', values_to = 'VARIANT_ID1and2') %>% 
  arrange(Variant.ID) %>% 
  select(VARIANT_ID1and2, score, -Variant.ID)
#> # A tibble: 10 x 2
#>    VARIANT_ID1and2   score
#>    <chr>             <dbl>
#>  1 01_1123425_A_G_1    0.1
#>  2 02_21234356_A_G_1   0.2
#>  3 03_31234356_A_G_1   0.3
#>  4 10_10312345_A_G_1   0.4
#>  5 11_1456768_A_G_1    0.5
#>  6 01_1254436_A_G_1    0.1
#>  7 02_2254436_A_G_1    0.2
#>  8 03_3255436_A_G_1    0.3
#>  9 10_10344745_A_G_1   0.4
#> 10 11_11256437_A_G_1   0.5

Created on 2020-05-11 by the reprex package (v0.3.0)reprex package (v0.3.0) 于 2020 年 5 月 11 日创建

arrange(Variant.ID) is there only to sort it the same way as your provided output. arrange(Variant.ID)仅用于以与您提供的output相同的方式对其进行排序。 Variant.ID column holds information about the ID. Variant.ID列包含有关 ID 的信息。 I removed it from the final table with select(..., -Variant.ID) .我使用select(..., -Variant.ID)将其从决赛桌中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM