如何删除 R 中的重复项？

Question

So I have file1.txt:所以我有file1.txt：

L["Corn Flakes"] = ""
L["Rice Oats"] = ""
L["Shreddies"] = ""

and file2.txt:和file2.txt：

L["Marshmellows"] = "Tesco"
L["Golden Syrup"] = "Morrisons"
L["Corn Flakes"] = "Tesco"
L["Bran Flakes"] = "Asda"
L["Super Flakes"] = "Asda"
L["Rice Oats"] = "Asda"
L["Shreddies"] = "Morrisons"
L["Rice Krispies"] = "Tesco"

So I have this script which merges file2 into file1 but only for rows that exist in file1.所以我有这个脚本，它将 file2 合并到 file1 中，但仅适用于 file1 中存在的行。

# Read file1
file1 <- read.table('file1.txt', sep = '=', quote = '', fill = TRUE)

# Remove commented out rows from file1
file1 <- file1[!grepl('--', file1$V1), ]

# Remove line that contains {}
file1 <- file1[!grepl('\\{\\}', file1$V2), ]

# Read file2
file2 <- read.table('file2.txt', sep = '=', quote = '', fill = TRUE)

# Remove rows from file2 if they are incomplete
file2 <- file2[!grepl('""', file2$V2),]

# Merge file1 and file2 into result but only including rows that are complete
result <- file2[file2$V1 %in% file1$V1, ]

# Write to file
write.table(result, 'result.txt', sep = '=', col.names = FALSE, row.names = FALSE, quote = FALSE)

The script works fine except for one small thing.该脚本工作正常，除了一件小事。 I don't want result to include rows that are identical in file1 and file2.我不希望结果包含 file1 和 file2 中相同的行。 So if file1 contains L["Shreddies"] = "Tescos" and file2 contains L["Shreddies"] = "Tescos", I don't want L["Shreddies"] = "Tescos" to be included in result.因此，如果 file1 包含 L["Shreddies"] = "Tescos" 并且 file2 包含 L["Shreddies"] = "Tescos"，我不希望 L["Shreddies"] = "Tescos" 包含在结果中。 How do I do this?我该怎么做呢？

Answer 1

This should do it, after you run your existing code up to result and before write.table :这应该做到这一点，在你运行现有代码到result之前和write.table之前：

result <- unique(result)

EDIT after better understanding problem.在更好地理解问题后进行编辑。 I think this should do it:我认为应该这样做：

result <- file2[(file2$V1 %in% file1$V1) &
                !(file2[c("V1", "V2")] %in% file1[c("V1","V2")]),]

One more EDIT:再编辑：

result <- file2[(file2$V1 %in% file1$V1),]

library(dplyr)

result <- result %>% 
            anti_join(file1, by = c("V1", "V2"))

如何删除 R 中的重复项？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-27 14:48:26

如何删除 R 中的重复项？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-27 14:48:26

解决方案1
1 已采纳 2021-04-27 14:48:26