How to remove duplicates in a loop in R

Question

I have a loop which goes through a large number of.tsv files and runs a function to output results to one file. The loop works, however a copy of the.tsv files have duplicate values in one of the columns which prevents the loop working. I need to remove the rows with the duplicate values in column V5. I have tried previous commands addressed on this site, but they are not working for some reason..

My input.tsv files look like this (other_trait)

V1         V2         V3   V4    V5                    
10        201874235  G   T   rs389130213 

10        201876195  G   C   rs121467298 

10        201876295  T   A   rs121467298

My code starts as below to format the files before running through function.

files <- list.files(path =".", pattern = ".tsv")
files
datalist = list()
for(i in 1:length(files)) {  
  other_trait <- read.table(files[i])
  colnames(other_trait)[which(names(other_trait) == "V2")] <- "BP"
  other_trait<- merge(other_trait, subset_1[,c("BP","MAF")], by="BP")
  other_trait <- unique(other_trait$V5)

I have tried using unique as above and also other_trait <- other_trait[,(duplicated(other_trait$V5)), ] Unique deletes row the other values in dataframe and just retains the unique values in V5, and !(duplicated) doesn't seem to do anything!

Answer 1

df <- read.table(text = "V1 V2 V3 V4 V5
10 201874235 G T rs389130213

10 201876195 G C rs121467298

10 201876295 T A rs121467298", h = T)

library(dplyr)
df %>% 
  rename(BP = V2) %>% 
  left_join(subset_1[,c("BP","MAF")], by="BP") %>% 
  distinct(V5, .keep_all = T)

How to remove duplicates in a loop in R

Question

1 answers

solution1
0 ACCPTED 2022-07-27 11:29:13

How to remove duplicates in a loop in R

Question

1 answers

solution1 0 ACCPTED 2022-07-27 11:29:13

solution1
0 ACCPTED 2022-07-27 11:29:13