按观察次数过滤ggplot2密度图

Question

是否有可能过滤掉ggplot2调用中具有少量观察数据的子集？

例如，采用以下图： qplot(price,data=diamonds,geom="density",colour=cut)

密度图

情节有点忙，我想用少量的观察结果排除cut值，即

> xtabs(~cut,diamonds)
cut
     Fair      Good Very Good   Premium     Ideal 
     1610      4906     12082     13791     21551

cut因素的Fair和Good品质。

我想要一个可以适应任意数据集的解决方案，如果可能的话，不仅可以选择阈值数量的观测值，而且可以选择前3个。

Answer 1

ggplot(subset(diamonds, cut %in% arrange(count(diamonds, .(cut)), desc(freq))[1:3,]$cut),
  aes(price, colour=cut)) + 
  geom_density() + facet_grid(~cut)

count将每个元素计入data.frame。
根据指定的列arrange data.frame。
desc启用逆序排序。
最后将切割包含在前3中的行按%in%子集化。

Answer 2

这是我的看法。 首先创建一个返回更多obs类别的函数。

firstx <- function (category, data, x = 1:3) {
  tab <- xtabs(~category, data)

  dimnames(tab)$category[order(tab, decreasing = TRUE)[x]]
}

#Then use subset to subset the data and droplevels to drop unused levels
#so they don't clutter the legend.
ggplot(droplevels(subset(diamonds, cut %in% firstx(cut, diamonds))), 
       aes(price, color = cut)) + geom_density()

我希望有所帮助。

Answer 3

这似乎要求编写自己的子集函数，可能是这样的：

mySubset <- function(dat,largestK=3,thresh=NULL){
   if (is.null(thresh)){
      tbl <- sort(table(dat)) 
      return(dat %in% tail(names(tbl),largestK))
   }
   else{
      return(dat >= thresh)
   }
}

这可以在ggplot调用中使用，如下所示：

ggplot(diamonds[mySubset(diamonds$cut),],...)

此代码不涉及从因子中删除级别，因此请注意这一点。 出于这个原因，我通常将分类变量保留为字符，除非我绝对需要它们进行排序。

Answer 4

## Top 3 cuts
tmp <- names(sort(summary(diamonds$cut), decreasing = T))[1:3]
tmp <- droplevels(subset(diamonds, cut == tmp))
ggplot(tmp, aes(price, color=cut)) + geom_density()

在此输入图像描述

但你考虑过分面吗？

ggplot(diamonds, aes(price, color=cut)) + geom_density() + facet_grid(~cut)

在此输入图像描述

按观察次数过滤ggplot2密度图

问题描述

4 个解决方案

解决方案1
11 已采纳 2011-05-20 14:43:42

解决方案2
3 2011-05-20 14:38:20

解决方案3
2 2011-05-20 14:33:24

解决方案4
1 2011-05-20 14:25:59

按观察次数过滤ggplot2密度图

问题描述

4 个解决方案

解决方案1 11 已采纳 2011-05-20 14:43:42

解决方案2 3 2011-05-20 14:38:20

解决方案3 2 2011-05-20 14:33:24

解决方案4 1 2011-05-20 14:25:59

解决方案1
11 已采纳 2011-05-20 14:43:42

解决方案2
3 2011-05-20 14:38:20

解决方案3
2 2011-05-20 14:33:24

解决方案4
1 2011-05-20 14:25:59