Subset data frame when two columns have specific different values

Question

I have a data frame with this structure:

    ID  Chromosome.EU  Position.EU  Chromosome.AM  Position.AM
AX-875          chr02     50241802          chr02      1773016
AX-964          chr02     51189882          chr05      2720414
AX-873          chr04     51371415          chr04      2902066
AX-962          chr06     51442510          chr02      2973445
AX-872          chr05     51531135          chr02      3067694
AX-877          chr02     51806507          chr05      3357612
AX-869          chr05     51816808          chr05      3367924

I want to get a subset including only the IDs with different chromosome locations, but only according to the couple chr02-chr05 , ie:

    ID  Chromosome.EU  Position.EU  Chromosome.AM  Position.AM
AX-964          chr02     51189882          chr05      2720414
AX-872          chr05     51531135          chr02      3067694
AX-877          chr02     51806507          chr05      3357612

I have written a conditional sentence which I thing fulfills what I am looking for:

df[(df$Chromosome.EU=="chr02" & df$Chromosome.AM=="chr05") | (df$Chromosome.EU=="chr05" & df$Chromosome.AM=="chr02"),]

However, it seems too long to me, I would like to know if a more concise structure is possible. Thanks in advance!

Answer 1

A "tidyverse" solution. I'm not sure whether this counts as "more concise", but I think it's somewhat readable. You can create a new column by sorting and pasting the chromosome pairs to create, for example, "chr02chr05", and then filter on that.

library(dplyr)
library(purrr)

df %>% 
  mutate(ChromPair = map2(Chromosome.EU, Chromosome.AM, ~
                          paste0(sort(c(.x, .y)), collapse = ""))) %>% 
  filter(ChromPair == "chr02chr05")

Result:

      ID Chromosome.EU Position.EU Chromosome.AM Position.AM  ChromPair
1 AX-964         chr02    51189882         chr05     2720414 chr02chr05
2 AX-872         chr05    51531135         chr02     3067694 chr02chr05
3 AX-877         chr02    51806507         chr05     3357612 chr02chr05

Subset data frame when two columns have specific different values

Question

1 answers

solution1
1 2022-09-05 10:43:32

Subset data frame when two columns have specific different values

Question

1 answers

solution1 1 2022-09-05 10:43:32

solution1
1 2022-09-05 10:43:32