简体   繁体   English

如何删除 R 中的重复项?

[英]How do I remove duplicates in R?

So I have file1.txt:所以我有file1.txt:

L["Corn Flakes"] = ""
L["Rice Oats"] = ""
L["Shreddies"] = ""

and file2.txt:和file2.txt:

L["Marshmellows"] = "Tesco"
L["Golden Syrup"] = "Morrisons"
L["Corn Flakes"] = "Tesco"
L["Bran Flakes"] = "Asda"
L["Super Flakes"] = "Asda"
L["Rice Oats"] = "Asda"
L["Shreddies"] = "Morrisons"
L["Rice Krispies"] = "Tesco"

So I have this script which merges file2 into file1 but only for rows that exist in file1.所以我有这个脚本,它将 file2 合并到 file1 中,但仅适用于 file1 中存在的行。

# Read file1
file1 <- read.table('file1.txt', sep = '=', quote = '', fill = TRUE)

# Remove commented out rows from file1
file1 <- file1[!grepl('--', file1$V1), ]

# Remove line that contains {}
file1 <- file1[!grepl('\\{\\}', file1$V2), ]

# Read file2
file2 <- read.table('file2.txt', sep = '=', quote = '', fill = TRUE)

# Remove rows from file2 if they are incomplete
file2 <- file2[!grepl('""', file2$V2),]

# Merge file1 and file2 into result but only including rows that are complete
result <- file2[file2$V1 %in% file1$V1, ]

# Write to file
write.table(result, 'result.txt', sep = '=', col.names = FALSE, row.names = FALSE, quote = FALSE)

The script works fine except for one small thing.该脚本工作正常,除了一件小事。 I don't want result to include rows that are identical in file1 and file2.我不希望结果包含 file1 和 file2 中相同的行。 So if file1 contains L["Shreddies"] = "Tescos" and file2 contains L["Shreddies"] = "Tescos", I don't want L["Shreddies"] = "Tescos" to be included in result.因此,如果 file1 包含 L["Shreddies"] = "Tescos" 并且 file2 包含 L["Shreddies"] = "Tescos",我不希望 L["Shreddies"] = "Tescos" 包含在结果中。 How do I do this?我该怎么做呢?

This should do it, after you run your existing code up to result and before write.table :这应该做到这一点,在你运行现有代码到result之前和write.table之前:

result <- unique(result)

EDIT after better understanding problem.在更好地理解问题后进行编辑。 I think this should do it:我认为应该这样做:

result <- file2[(file2$V1 %in% file1$V1) &
                !(file2[c("V1", "V2")] %in% file1[c("V1","V2")]),]

One more EDIT:再编辑:

result <- file2[(file2$V1 %in% file1$V1),]

library(dplyr)

result <- result %>% 
            anti_join(file1, by = c("V1", "V2"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM