I have a data frame with this structure:
ID Chromosome.EU Position.EU Chromosome.AM Position.AM
AX-875 chr02 50241802 chr02 1773016
AX-964 chr02 51189882 chr05 2720414
AX-873 chr04 51371415 chr04 2902066
AX-962 chr06 51442510 chr02 2973445
AX-872 chr05 51531135 chr02 3067694
AX-877 chr02 51806507 chr05 3357612
AX-869 chr05 51816808 chr05 3367924
I want to get a subset including only the IDs with different chromosome locations, but only according to the couple chr02-chr05 , ie:
ID Chromosome.EU Position.EU Chromosome.AM Position.AM
AX-964 chr02 51189882 chr05 2720414
AX-872 chr05 51531135 chr02 3067694
AX-877 chr02 51806507 chr05 3357612
I have written a conditional sentence which I thing fulfills what I am looking for:
df[(df$Chromosome.EU=="chr02" & df$Chromosome.AM=="chr05") | (df$Chromosome.EU=="chr05" & df$Chromosome.AM=="chr02"),]
However, it seems too long to me, I would like to know if a more concise structure is possible. Thanks in advance!
A "tidyverse" solution. I'm not sure whether this counts as "more concise", but I think it's somewhat readable. You can create a new column by sorting and pasting the chromosome pairs to create, for example, "chr02chr05", and then filter on that.
library(dplyr)
library(purrr)
df %>%
mutate(ChromPair = map2(Chromosome.EU, Chromosome.AM, ~
paste0(sort(c(.x, .y)), collapse = ""))) %>%
filter(ChromPair == "chr02chr05")
Result:
ID Chromosome.EU Position.EU Chromosome.AM Position.AM ChromPair
1 AX-964 chr02 51189882 chr05 2720414 chr02chr05
2 AX-872 chr05 51531135 chr02 3067694 chr02chr05
3 AX-877 chr02 51806507 chr05 3357612 chr02chr05
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.