简体   繁体   中英

R - Listing the levels of factor 1 that have more than 2 levels of factor 2

I have a species distribution dataset based on museum collections. What I want to do is list the collection towns (factor) where more than 2 species (factor) have been collected.

Thank you!

Generate 30 observations from three cities, and 20 species (labelled as numbers for easy generation)

df <- data.frame( city=as.factor( rep(c('NY', 'CH', 'LA'),10) ),
                  species=as.factor( sample(1:20, 30, replace=T) )
                ) 

peek at the data

table(df$city, df$species)

Using plyr :

count observations for species in each city using ddply from plyr package, and return the observations with more than one observation

ddply(df, .(city), .fun=function(x){
  counts <- count(x$species)
  counts[counts$freq > 1,]
})

resulting in

  city  x freq
1   CH 10    3
2   CH 12    2
3   LA  9    2
4   NY  1    2
5   NY 13    3

where x is the species, and freq is the number of observations of the species in the city.

Using dplyr :

df %>% 
  group_by(city) %>% 
  select(species) %>% 
  count() %>% 
  filter(freq>1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM