[英]How do I remove duplicates in R?
So I have file1.txt:所以我有file1.txt:
L["Corn Flakes"] = ""
L["Rice Oats"] = ""
L["Shreddies"] = ""
and file2.txt:和file2.txt:
L["Marshmellows"] = "Tesco"
L["Golden Syrup"] = "Morrisons"
L["Corn Flakes"] = "Tesco"
L["Bran Flakes"] = "Asda"
L["Super Flakes"] = "Asda"
L["Rice Oats"] = "Asda"
L["Shreddies"] = "Morrisons"
L["Rice Krispies"] = "Tesco"
So I have this script which merges file2 into file1 but only for rows that exist in file1.所以我有这个脚本,它将 file2 合并到 file1 中,但仅适用于 file1 中存在的行。
# Read file1
file1 <- read.table('file1.txt', sep = '=', quote = '', fill = TRUE)
# Remove commented out rows from file1
file1 <- file1[!grepl('--', file1$V1), ]
# Remove line that contains {}
file1 <- file1[!grepl('\\{\\}', file1$V2), ]
# Read file2
file2 <- read.table('file2.txt', sep = '=', quote = '', fill = TRUE)
# Remove rows from file2 if they are incomplete
file2 <- file2[!grepl('""', file2$V2),]
# Merge file1 and file2 into result but only including rows that are complete
result <- file2[file2$V1 %in% file1$V1, ]
# Write to file
write.table(result, 'result.txt', sep = '=', col.names = FALSE, row.names = FALSE, quote = FALSE)
The script works fine except for one small thing.该脚本工作正常,除了一件小事。 I don't want result to include rows that are identical in file1 and file2.
我不希望结果包含 file1 和 file2 中相同的行。 So if file1 contains L["Shreddies"] = "Tescos" and file2 contains L["Shreddies"] = "Tescos", I don't want L["Shreddies"] = "Tescos" to be included in result.
因此,如果 file1 包含 L["Shreddies"] = "Tescos" 并且 file2 包含 L["Shreddies"] = "Tescos",我不希望 L["Shreddies"] = "Tescos" 包含在结果中。 How do I do this?
我该怎么做呢?
This should do it, after you run your existing code up to result
and before write.table
:这应该做到这一点,在你运行现有代码到
result
之前和write.table
之前:
result <- unique(result)
EDIT after better understanding problem.在更好地理解问题后进行编辑。 I think this should do it:
我认为应该这样做:
result <- file2[(file2$V1 %in% file1$V1) &
!(file2[c("V1", "V2")] %in% file1[c("V1","V2")]),]
One more EDIT:再编辑:
result <- file2[(file2$V1 %in% file1$V1),]
library(dplyr)
result <- result %>%
anti_join(file1, by = c("V1", "V2"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.