简体   繁体   English

ggplot2 条形图的子集 data.frame

[英]subset data.frame for ggplot2 bar chart

I have the following data:我有以下数据:

    Splice.Pair  proportion
1         AA-AG 0.010909091
2         AA-GC 0.003636364
3         AA-TG 0.003636364
4         AA-TT 0.007272727
5         AC-AC 0.003636364
6         AC-AG 0.003636364
7         AC-GA 0.003636364
8         AC-GG 0.003636364
9         AC-TC 0.003636364
10        AC-TG 0.003636364
11        AC-TT 0.003636364
12        AG-AA 0.010909091
13        AG-AC 0.007272727
14        AG-AG 0.003636364
15        AG-AT 0.003636364
16        AG-CC 0.003636364
17        AG-CT 0.007272727
...       ...   ...

I want to get a barchart visualising the proportion of each splice pair but only for splice pairs that have a proportion over, say, 0.004.我想获得一个条形图,可视化每个拼接对的比例,但仅适用于比例超过 0.004 的拼接对。 I tried the following:我尝试了以下方法:

nc.subset <- subset(nc.dat, proportion > 0.004)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();

But this just gives me a bar chart with all splice pairs on the Y-axis, except that the splice pairs that were filtered out are missing bars.但这只是给了我一个条形图,其中包含 Y 轴上的所有拼接对,除了被过滤掉的拼接对是缺失的条形图。在此处输入图像描述

I have no idea what is happening to allow all categories to still be present:s我不知道发生了什么让所有类别仍然存在:s

What's happening is that Splice.Pair is a factor.发生的事情是 Splice.Pair 是一个因素。 When you subset your data frame, the factor retains it's levels attribute, which still has all of the original levels.当您对数据框进行子集化时,该因子将保留其级别属性,该属性仍然具有所有原始级别。 You can avoid this kind of problem by simply wrapping your subsetting in droplevels :您可以通过简单地将子集包装在droplevels中来避免此类问题:

nc.subset <- droplevels(subset(nc.dat, proportion > 0.004))

More generally, if you dislike this kind of automatic retention of levels with factors, you can set R to store strings as character vectors rather than factors by default by setting:更一般地说,如果您不喜欢这种自动保留因子级别,您可以设置 R 以将字符串存储为字符向量而不是默认情况下的因子,方法是:

options(stringsAsFactors = FALSE)

at the beginning of your R session (this can also be passed as an option to data.frame as well).在 R session 的开头(这也可以作为选项传递给data.frame )。

EDIT编辑

Regarding the issue of running older versions of R that may lack droplevels , @rcs points out in a comment that the method for a single factor is very simple to implement on your own.关于运行可能缺少droplevels的旧版本 R 的问题,@rcs 在评论中指出,单因素的方法很容易自己实现。 The method for data frames is only slightly more complicated:数据帧的方法只是稍微复杂一点:

function (x, except = NULL, ...) 
{
    ix <- vapply(x, is.factor, NA)
    if (!is.null(except)) 
        ix[except] <- FALSE
    x[ix] <- lapply(x[ix], factor)
    x
}

But of course, the best solution is still to upgrade to the latest version of R.但当然,最好的解决办法还是升级到最新版本的R。

Check whether Splice.Pair is a factor.检查 Splice.Pair 是否是一个因素。 If that's the case, use droplevels() to remove the levels that are no longer used to resolve your problem.如果是这种情况,请使用droplevels()删除不再用于解决问题的级别。

nc.subset <- subset(nc.dat, proportion > 0.004)
nc.subset$Splice.Pair <- droplevels(nc.subset$Splice.Pair)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();

You may be able to incorporate droplevels into qlot , but that's for you to find you:-)您也许可以将droplevels合并到qlot中,但那是您自己找到的:-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM