简体   繁体   English

查找在data.frame 1中但在data.frame 2中没有的行

[英]Find the rows which are in data.frame 1 but no in data.frame 2

I have one data.frame (Data) and a subset of this data.frame (Data2) 我有一个data.frame(Data)和此data.frame(Data2)的子集

set.seed(1)
Data <- data.frame(id = seq(1, 10), 
  Diag1 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag2 = sample(c("D123", "E123", "F123"), 10, replace = TRUE), 
  Diag3 = sample(c("G123", "H123", "I123"), 10, replace = TRUE), 
  Diag4 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag5 = sample(c("J123", "K123", "L123"), 10, replace = TRUE), 
  Diag6 = sample(c("M123", "N123", "O123"), 10, replace = TRUE), 
  Diag7 = sample(c("P123", "Q123", "R123"), 10, replace = TRUE))

Data2 <- Data[1:4,]

How do I get the "difference" of both data.frames? 如何获得两个data.frame的“差异”? I am looking for the rows which are in Data but not in Data2. 我正在寻找在Data中但不在Data2中的行。

I thought something like this Data[!Data2] should have worked but it didn't. 我认为类似Data [!Data2]的东西应该可以,但是没有。

Thank you! 谢谢!

I think you're using data.table constructs on data.frame . 我认为你正在使用data.table上构建data.frame This should work instead - 这应该工作-

library(data.table)
Data <- data.table(Data)
Data2 <- data.table(Data2)

setkeyv(Data,colnames(Data))
setkeyv(Data2,colnames(Data2))

Data[!Data2]

data.table keys are your (best!) friend data.table键是您(最好的!)朋友

library(data.table)

Data  <- as.data.table(Data)
Data2 <- as.data.table(Data2)

## set whichever cols make sense as keys
setkey(Data, Diag1, Diag2, Diag3)  
## or to set all columns as key, use  
#  setkey(Data)

## Same key for Data2
setkey(Data2, Diag1, Diag2, Diag3)  
## or 
# setkeyv(Data2, key(Data))  # <~ Note: Use setkeyv for strings


Data[!.(Data2)]

   id Diag1 Diag2 Diag3 Diag4 Diag5 Diag6 Diag7
1:  5  A123  F123  G123  C123  K123  M123  Q123
2: 10  A123  F123  H123  B123  L123  N123  R123
3:  9  B123  E123  I123  C123  L123  N123  P123
4:  6  C123  E123  H123  C123  L123  M123  P123
5:  7  C123  F123  G123  C123  J123  M123  Q123

This will solve your exact problem here, but it can probably be generalized using the count function from plyr 这将在此处解决您的确切问题,但可以使用plyrcount函数将其概括化

library(plyr)
df <- as.data.frame(rbind(Data, Data2)) # rbind data sets
df <- count(df, vars = names(df))       # count frequency of rows
subset(df, freq < 2)                    # subset the data.frame when freq < 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM