简体   繁体   English

提取R中两个数据帧的区分行

[英]Extract the discriminating rows of two dataframes in R

I have two dataframes structured like this: 我有两个结构如下的数据框:

X A  B  C SUM
E 1  0  1  2
F 0  0  1  1
G 1  1  0  2

and this: 和这个:

X A  B  C SUM
E 1  0  1  2
F 0  0  1  1
G 1  1  0  2
H 0  0  1  1
I 0  0  0  0

The result that i want to obtain is: 我想要获得的结果是:

 X A  B  C   
 H 0  0  1 

So, i want a code which is able to create another dataframe made by only those rows which are not present in both dataframes . 因此,我想要一个能够创建仅由两个数据帧中都不存在的行构成的另一个数据帧的代码。 Moreover, the sum of these rows has to be more than zero . 此外,这些行的总和必须大于零

Could someone help me? 有人可以帮我吗? Thank You! 谢谢!

Here's one solution to your question. 这是您问题的一种解决方案。 Let the two data-sets be mydata1 and mydata2 让两个数据集分别为mydata1mydata2

require(dplyr)
rbind(anti_join(mydata1 %>% filter(SUM > 0), mydata2 %>% filter(SUM > 0), by = colnames(mydata1)), 
      anti_join(mydata2 %>% filter(SUM > 0), mydata1 %>% filter(SUM > 0), by = colnames(mydata1)))

Based on the comment, there's one thing you can do is make sure the columns are same. 根据评论,您可以做的一件事就是确保列相同。

require(dplyr)
common_columns <- intersect(colnames(mydata1), colnames(mydata2))
rbind(anti_join(mydata1 %>% filter(SUM > 0), mydata2 %>% filter(SUM > 0), by = common_columns), 
      anti_join(mydata2 %>% filter(SUM > 0), mydata1 %>% filter(SUM > 0), by = common_columns))
require(data.table)
dat1 <- data.table(X = c("E","F","G"), A = c(1,0,1), B = c(0,0,1), C = c(1,1,0), SUM = c(2,1,2))
dat2 <- data.table(X = c("E","F","G","H","I"), A = c(1,0,1,0,0),  B = c(0,0,1,0,0), C = c(1,1,0,1,0),
               SUM = c(2,1,2,1,0))

dat3 <- rbind(dat1[,!(names(dat1) %in% "SUM"), with = FALSE], dat2[,!(names(dat2) %in% "SUM"), with = FALSE])

dat3[duplicated(dat3)==FALSE & duplicated(dat3, fromLast = TRUE)==FALSE & 
   rowSums(dat3[,!(names(dat3) %in% "X"), with = FALSE])>0]
library(data.table)
dat1 <- data.table(X = c("E","F","G"), A = c(1,0,1), B = c(0,0,1), C = c(1,1,0), SUM = c(2,1,2))
dat2 <- data.table(X = c("E","F","G","H","I"), A = c(1,0,1,0,0),  B = c(0,0,1,0,0), C = c(1,1,0,1,0),
                   SUM = c(2,1,2,1,0))


D1=dat1[!dat1$X%in%dat2$X,]
D2=dat2[!dat2$X%in%dat1$X,]
DF=rbind(D1,D2)
DF=DF[DF$SUM>0,]
DF$SUM=NULL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM