[英]R - Listing the levels of factor 1 that have more than 2 levels of factor 2
I have a species distribution dataset based on museum collections. 我有一个基于博物馆藏品的物种分布数据集。 What I want to do is list the collection towns (factor) where more than 2 species (factor) have been collected. 我要做的是列出已收集了2种以上(因子)的收集镇(因子)。
Thank you! 谢谢!
Generate 30 observations from three cities, and 20 species (labelled as numbers for easy generation) 从三个城市生成30个观测值和20种(标记为易于生成的数字)
df <- data.frame( city=as.factor( rep(c('NY', 'CH', 'LA'),10) ),
species=as.factor( sample(1:20, 30, replace=T) )
)
peek at the data 偷看数据
table(df$city, df$species)
Using plyr : 使用plyr :
count observations for species in each city using ddply from plyr package, and return the observations with more than one observation 使用plyr软件包中的ddply对每个城市中物种的观测值进行计数,并使用多个观测值返回观测值
ddply(df, .(city), .fun=function(x){
counts <- count(x$species)
counts[counts$freq > 1,]
})
resulting in 导致
city x freq
1 CH 10 3
2 CH 12 2
3 LA 9 2
4 NY 1 2
5 NY 13 3
where x is the species, and freq is the number of observations of the species in the city. 其中x是物种,而freq是城市中该物种的观测数目。
Using dplyr : 使用dplyr :
df %>%
group_by(city) %>%
select(species) %>%
count() %>%
filter(freq>1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.