[英]How to coalesce two columns and retain a 3rd column in r?
I am trying to combine 2 ID columns into one, but keep each individual ID's score in another column.我正在尝试将 2 个 ID 列合并为一个,但将每个 ID 的分数保留在另一列中。 For example my data looks like this:例如,我的数据如下所示:
VARIANT_ID1 VARIANT_ID2 score
01_1123425_A_G_1 01_1254436_A_G_1 0.1
02_21234356_A_G_1 02_2254436_A_G_1 0.2
03_31234356_A_G_1 03_3255436_A_G_1 0.3
10_10312345_A_G_1 10_10344745_A_G_1 0.4
I am trying to output this:我正在尝试 output 这个:
VARIANT_ID1and2 score
01_1123425_A_G_1 0.1
02_21234356_A_G_1 0.2
03_31234356_A_G_1 0.3
10_10312345_A_G_1 0.4
01_1254436_A_G_1 0.1 #VARIANT_ID2 appended below VARIANT_ID1 here including their scores
02_2254436_A_G_1 0.2
03_3255436_A_G_1 0.3
10_10344745_A_G_1 0.4
I've been trying to use coalesce()
from dplyr but haven't been able to find information on how to get the 3rd column included, I have a biology backgronud so not sure of any other functions which can account for this, any help to possible functions would be appreciated.我一直在尝试使用 dplyr 中的coalesce()
但无法找到有关如何包含第三列的信息,我有一个生物学背景,所以不确定任何其他可以解释这一点的功能,任何帮助可能的功能将不胜感激。
Input data:输入数据:
structure(list(VARIANT_ID1 = c("01_1123425_A_G_1", "02_21234356_A_G_1",
"03_31234356_A_G_1", "10_10312345_A_G_1", "11_1456768_A_G_1"),
VARIANT_ID2 = c("01_1254436_A_G_1", "02_2254436_A_G_1", "03_3255436_A_G_1",
"10_10344745_A_G_1", "11_11256437_A_G_1"), score = c(0.1,
0.2, 0.3, 0.4, 0.5)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
Using pivot_longer()
from tidyr
(part of tidyverse
):使用 tidyr 中的tidyverse
pivot_longer()
( tidyr
的一部分):
library(tidyverse)
df <-
structure(
list(
VARIANT_ID1 = c(
"01_1123425_A_G_1",
"02_21234356_A_G_1",
"03_31234356_A_G_1",
"10_10312345_A_G_1",
"11_1456768_A_G_1"
),
VARIANT_ID2 = c(
"01_1254436_A_G_1",
"02_2254436_A_G_1",
"03_3255436_A_G_1",
"10_10344745_A_G_1",
"11_11256437_A_G_1"
),
score = c(0.1,
0.2, 0.3, 0.4, 0.5)
),
row.names = c(NA, -5L),
class = c("data.table",
"data.frame")
)
df %>%
pivot_longer(starts_with('VARIANT_ID'), names_to = 'Variant.ID', names_prefix = 'VARIANT_ID', values_to = 'VARIANT_ID1and2') %>%
arrange(Variant.ID) %>%
select(VARIANT_ID1and2, score, -Variant.ID)
#> # A tibble: 10 x 2
#> VARIANT_ID1and2 score
#> <chr> <dbl>
#> 1 01_1123425_A_G_1 0.1
#> 2 02_21234356_A_G_1 0.2
#> 3 03_31234356_A_G_1 0.3
#> 4 10_10312345_A_G_1 0.4
#> 5 11_1456768_A_G_1 0.5
#> 6 01_1254436_A_G_1 0.1
#> 7 02_2254436_A_G_1 0.2
#> 8 03_3255436_A_G_1 0.3
#> 9 10_10344745_A_G_1 0.4
#> 10 11_11256437_A_G_1 0.5
Created on 2020-05-11 by the reprex package (v0.3.0)由reprex package (v0.3.0) 于 2020 年 5 月 11 日创建
arrange(Variant.ID)
is there only to sort it the same way as your provided output. arrange(Variant.ID)
仅用于以与您提供的output相同的方式对其进行排序。 Variant.ID
column holds information about the ID. Variant.ID
列包含有关 ID 的信息。 I removed it from the final table with select(..., -Variant.ID)
.我使用select(..., -Variant.ID)
将其从决赛桌中删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.