[英]Find the rows which are in data.frame 1 but no in data.frame 2
I have one data.frame (Data) and a subset of this data.frame (Data2) 我有一个data.frame(Data)和此data.frame(Data2)的子集
set.seed(1)
Data <- data.frame(id = seq(1, 10),
Diag1 = sample(c("A123", "B123", "C123"), 10, replace = TRUE),
Diag2 = sample(c("D123", "E123", "F123"), 10, replace = TRUE),
Diag3 = sample(c("G123", "H123", "I123"), 10, replace = TRUE),
Diag4 = sample(c("A123", "B123", "C123"), 10, replace = TRUE),
Diag5 = sample(c("J123", "K123", "L123"), 10, replace = TRUE),
Diag6 = sample(c("M123", "N123", "O123"), 10, replace = TRUE),
Diag7 = sample(c("P123", "Q123", "R123"), 10, replace = TRUE))
Data2 <- Data[1:4,]
How do I get the "difference" of both data.frames? 如何获得两个data.frame的“差异”? I am looking for the rows which are in Data but not in Data2. 我正在寻找在Data中但不在Data2中的行。
I thought something like this Data[!Data2] should have worked but it didn't. 我认为类似Data [!Data2]的东西应该可以,但是没有。
Thank you! 谢谢!
I think you're using data.table
constructs on data.frame
. 我认为你正在使用data.table
上构建data.frame
。 This should work instead - 这应该工作-
library(data.table)
Data <- data.table(Data)
Data2 <- data.table(Data2)
setkeyv(Data,colnames(Data))
setkeyv(Data2,colnames(Data2))
Data[!Data2]
data.table keys are your (best!) friend data.table键是您(最好的!)朋友
library(data.table)
Data <- as.data.table(Data)
Data2 <- as.data.table(Data2)
## set whichever cols make sense as keys
setkey(Data, Diag1, Diag2, Diag3)
## or to set all columns as key, use
# setkey(Data)
## Same key for Data2
setkey(Data2, Diag1, Diag2, Diag3)
## or
# setkeyv(Data2, key(Data)) # <~ Note: Use setkeyv for strings
Data[!.(Data2)]
id Diag1 Diag2 Diag3 Diag4 Diag5 Diag6 Diag7
1: 5 A123 F123 G123 C123 K123 M123 Q123
2: 10 A123 F123 H123 B123 L123 N123 R123
3: 9 B123 E123 I123 C123 L123 N123 P123
4: 6 C123 E123 H123 C123 L123 M123 P123
5: 7 C123 F123 G123 C123 J123 M123 Q123
This will solve your exact problem here, but it can probably be generalized using the count
function from plyr
这将在此处解决您的确切问题,但可以使用plyr
的count
函数将其概括化
library(plyr)
df <- as.data.frame(rbind(Data, Data2)) # rbind data sets
df <- count(df, vars = names(df)) # count frequency of rows
subset(df, freq < 2) # subset the data.frame when freq < 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.