简体   繁体   中英

Subset dataframe by unique values within a column in R

Hello I have a dataframe such as

Group COL1 Event 
G1 SP1  1
G1 SP2  1
G1 SP3  2
G1 SP3  2 
G2 SP4  3
G2 SP7  3
G2 SP5  6
G3 SP1  1 
G4 SP1  6  

And I want to keep only COL1 if Event is unique (so here for exemple SP3 and SP5 are unique within the column Event).

Then I should get:

Group COL1 Event 
G1 SP3  2
G1 SP3  2 
G2 SP5  6 
G3 SP1  1 
G4 SP1  6 

SP1 and SP2 were 2 in column Event1 so they do not pass

SP4 and SP7 were 2 in column Event3 so they do not pass

You can use data.table to group by Group and Event and only return the group contents ( .SD ) if the number of unique COL1 values ( uniqueN(COL1) ) is 1.

library(data.table)
setDT(df)

df[, if(uniqueN(COL1) == 1) .SD, by = .(Group, Event)]
#    Group Event COL1
# 1:    G1     2  SP3
# 2:    G1     2  SP3
# 3:    G2     6  SP5
# 4:    G3     1  SP1
# 5:    G4     6  SP1

Data used:

df <- fread('
Group COL1 Event 
G1 SP1  1
G1 SP2  1
G1 SP3  2
G1 SP3  2 
G2 SP4  3
G2 SP7  3
G2 SP5  6
G3 SP1  1 
G4 SP1  6  
')

An option with base R using ave

subset(df, ave(COL1, Group, Event,
      FUN = function(x) length(unique(x))) == 1)
#  Group COL1 Event
#3    G1  SP3     2
#4    G1  SP3     2
#7    G2  SP5     6
#8    G3  SP1     1
#9    G4  SP1     6
 

Another data.table option

> setDT(df)[,.SD[uniqueN(COL1)==1],.(Group,Event)]
   Group Event COL1
1:    G1     2  SP3
2:    G1     2  SP3
3:    G2     6  SP5
4:    G3     1  SP1
5:    G4     6  SP1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM