简体   繁体   中英

Substract two data.frames in R, by characters

I have a data set of 250000+ rows.

Three columns: country, test and test_result (character, character, numerical)

The next line off code reduce my data to 102388 rows.

sub.df1 <- df <- df[!duplicated(df), ]

This line off code reduce my data to 102339 rows.

sub.df2 <- unique(df[,c('country','test')])

Now i want to see these 49 rows. These rows containing the same country and test but have a different test_result.(in sub.df1)

I was trying to substract the sub.df1[1:2] - sub.df2 = sub.df3 Here sub.df2 are the 49 combinations of country and test who are appearing more then once in sub.df1.

Also tried some other approaches to reach my goal; merge(), match(), table(), rle(), but none of them sounds to fit on my problem.

Kind regards, Brecht

If you just want to get the difference, you can use duplicated .

df[duplicated(df[, c('country', 'test')]), ]

If you want to get all the duplicates as well, you could use eg data.table .

require(data.table)
setDT(df)
setkeyv(df, c('country', 'test'))
df[df[duplicated(df[, list(country, test)]), list(country, test)], ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM