简体   繁体   中英

ratio of counts in R 2d plot

I have 2 continuous variables (X and Y) that I want to bin into a 2d grid. Associated with every (x,y) pair I have a factor that is either PASS or FAIL. I want to plot in a 2d grid the ratio of PASS/FAIL.

For example, using the iris dataset: ggplot(iris, aes(x=Sepal.Length , y=Petal.Length)) + geom_bin2d() plots the total count in each 2d bin - how do I change this to plot the ratio of the count of virginica and versicolor in each bin?

By using stat_summary2d() , data preprocessing (turn binary factor into numeric in dataframe) and use the z argument associated with the stat_summary2d() function.

iris$tf <- as.numeric(as.logical(round(runif(nrow(iris)))))

ggplot(iris, aes(x=Sepal.Length , y=Petal.Length,z=tf)) +
stat_summary2d(bins = 10,binwidth = c(2)) + 
labs(title = "Ratio of T/F of Factor by Petal.Length and Sepal.Length") +
scale_fill_continuous(name = "Ratio")

Note: if you turn your binary factor to a numeric, it will coerce to 1/2 (instead of 0/1) by default, so subtract one off it. If it is a logical, then this won't be necessary.

Edit: added default fun='mean' argument to stat_summary2d() to make it clear this is the default behaviour of the function.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM