简体   繁体   中英

How to normalize counts in stat_binhex in R ggplot?

I've got a data.frame , with two variables, measuring parameters for two classes with very different amount of data for each class (~2500 samples vs ~100000 samples).

Sample code:

plot.gg <- ggplot(data=rbind(
                    data.frame(x=rnorm(2500, m=0.41, sd=0.1), y=rnorm(2500, m=12000, sd=1000), type="A"),
                    data.frame(x=rnorm(100000, m=0.60, sd=0.1), y=rnorm(100000, m=6000, sd=1000), type="B")
                  ),
              mapping=aes(x=x, y=y, colour=type, group=type)
             ) + geom_hex(alpha=0.3)

plot.gg

Result: 在此输入图像描述

Here, single color palette is used for both classes, which has resulted in uniform gray fill for class A. I would like to have a separate color palette for class A, to see its distribution also.

Another acceptable variant would be normalizing data to see percentage instead of counts. However, I cannot figure out, how to use ..count.. and (..count..)/sum(..count..) .

I also need alpha in geom_hex to see overlap in classes.

Found. The solution is aes(fill=..density..) in geom_hex .

plot.gg <- ggplot(data=rbind(
                       data.frame(x=rnorm(2500, m=0.41, sd=0.1), y=rnorm(2500, m=12000, sd=1000), type="A"),
                       data.frame(x=rnorm(100000, m=0.60, sd=0.1), y=rnorm(100000, m=6000, sd=1000), type="B")
),
mapping=aes(x=x, y=y, colour=type, group=type)) + geom_hex(alpha=0.6, aes(fill=..density..))

plot.gg

I've also increased alpha , because it now gives better look.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM