查找在data.frame 1中但在data.frame 2中没有的行

Question

I have one data.frame (Data) and a subset of this data.frame (Data2) 我有一个data.frame（Data）和此data.frame（Data2）的子集

set.seed(1)
Data <- data.frame(id = seq(1, 10), 
  Diag1 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag2 = sample(c("D123", "E123", "F123"), 10, replace = TRUE), 
  Diag3 = sample(c("G123", "H123", "I123"), 10, replace = TRUE), 
  Diag4 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag5 = sample(c("J123", "K123", "L123"), 10, replace = TRUE), 
  Diag6 = sample(c("M123", "N123", "O123"), 10, replace = TRUE), 
  Diag7 = sample(c("P123", "Q123", "R123"), 10, replace = TRUE))

Data2 <- Data[1:4,]

How do I get the "difference" of both data.frames? 如何获得两个data.frame的“差异”？ I am looking for the rows which are in Data but not in Data2. 我正在寻找在Data中但不在Data2中的行。

I thought something like this Data[!Data2] should have worked but it didn't. 我认为类似Data [！Data2]的东西应该可以，但是没有。

Thank you! 谢谢！

Answer 1

I think you're using data.table constructs on data.frame . 我认为你正在使用data.table上构建data.frame 。 This should work instead - 这应该工作-

library(data.table)
Data <- data.table(Data)
Data2 <- data.table(Data2)

setkeyv(Data,colnames(Data))
setkeyv(Data2,colnames(Data2))

Data[!Data2]

Answer 2

data.table keys are your (best!) friend data.table键是您（最好的！）朋友

library(data.table)

Data  <- as.data.table(Data)
Data2 <- as.data.table(Data2)

## set whichever cols make sense as keys
setkey(Data, Diag1, Diag2, Diag3)  
## or to set all columns as key, use  
#  setkey(Data)

## Same key for Data2
setkey(Data2, Diag1, Diag2, Diag3)  
## or 
# setkeyv(Data2, key(Data))  # <~ Note: Use setkeyv for strings


Data[!.(Data2)]

   id Diag1 Diag2 Diag3 Diag4 Diag5 Diag6 Diag7
1:  5  A123  F123  G123  C123  K123  M123  Q123
2: 10  A123  F123  H123  B123  L123  N123  R123
3:  9  B123  E123  I123  C123  L123  N123  P123
4:  6  C123  E123  H123  C123  L123  M123  P123
5:  7  C123  F123  G123  C123  J123  M123  Q123

Answer 3

This will solve your exact problem here, but it can probably be generalized using the count function from plyr 这将在此处解决您的确切问题，但可以使用plyr的count函数将其概括化

library(plyr)
df <- as.data.frame(rbind(Data, Data2)) # rbind data sets
df <- count(df, vars = names(df))       # count frequency of rows
subset(df, freq < 2)                    # subset the data.frame when freq < 2

查找在data.frame 1中但在data.frame 2中没有的行

问题描述

3 个解决方案

解决方案1
5 2013-10-22 16:39:50

解决方案2
4 2013-10-22 16:41:01

解决方案3
1 已采纳 2013-10-22 23:26:06

查找在data.frame 1中但在data.frame 2中没有的行

问题描述

3 个解决方案

解决方案1 5 2013-10-22 16:39:50

解决方案2 4 2013-10-22 16:41:01

解决方案3 1 已采纳 2013-10-22 23:26:06

解决方案1
5 2013-10-22 16:39:50

解决方案2
4 2013-10-22 16:41:01

解决方案3
1 已采纳 2013-10-22 23:26:06