简体   繁体   中英

Get unique combination from two columns in R

I have a file.csv in this format below. What I need to do is compare LeftChr and RightChr columns and get the uniqe combinations and strip off chr to get the result appended with t with every unique combination with the file name as shown in the result below.

>Id LeftChr LeftPosition    LeftStrand  LeftLength  RightChr
4465    chr1    33478980    +   60  chr1
4751    chr1    37908641    +   370 chr2
1690    chr1    37938262    -   112 chr5
4464    chr1    37938376    +   122 chr2
4463    chr2    59097215    +   675 chr2

result

file.csv:  t(1:1), t(1:2), t(1:5),t(2:2)

Assuming you've read this into a data frame called data :

x = with(data, unique(gsub(pattern = "chr",
                       replacement = "",
                       x = paste("t(", LeftChr, ":", RigthChr, ")"))))

paste("file.csv: ", paste(x, collapse = ", "))
dat <- read.table(text="
Id LeftChr LeftPosition    LeftStrand  LeftLength  RightChr
4465    chr1    33478980    +   60  chr1
4751    chr1    37908641    +   370 chr2
1690    chr1    37938262    -   112 chr5
4464    chr1    37938376    +   122 chr2
4463    chr2    59097215    +   675 chr2
", head=T, as.is=T)

dat %>% 
  mutate(lc=gsub("chr", "", LeftChr), rc=gsub("chr", "", RightChr)) %>%
  select(lc, rc) %>%
  group_by(lc, rc) %>%
  unique
Source: local data frame [5 x 2]
# Groups: lc, rc [4]
#
#      lc    rc
#   (chr) (chr)
# 1     1     1
# 2     1     2
# 3     1     5
# 4     2     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM