[英]R: Calculate and plot difference between two density countours
I have two datasets with two continuous variables: duration
and waiting
. 我有两个连续变量的数据集:
duration
和waiting
。
library("MASS")
data(geyser)
geyser1 <- geyser[1:150,]
geyser2 <- geyser[151:299,]
geyser2$duration <- geyser2$duration - 1
geyser2$waiting <- geyser2$waiting - 20
For each dataset I output a 2D density plot 对于每个数据集,我输出2D密度图
ggplot(geyser1, aes(x = duration, y = waiting)) +
xlim(0.5, 6) + ylim(40, 110) +
stat_density2d(aes(alpha=..level..),
geom="polygon", bins = 10)
ggplot(geyser2, aes(x = duration, y = waiting)) +
xlim(0.5, 6) + ylim(40, 110) +
stat_density2d(aes(alpha=..level..),
geom="polygon", bins = 10)
I now want to produce a plot which indicates the regions where the two plot have the same density (white), negative differences (gradation from white to blue where geyser2
is denser than geyser1
) and positive differences (gradation from white to red where geyser1
is denser than geyser2
). 我现在想要制作一个图,表示两个图具有相同密度(白色)的区域,负差异(从白色到蓝色的渐变,其中
geyser2
比geyser1
更密集)和正差异(从白色到红色的渐变,其中geyser1
是比geyser2
更密集)。
How to compute and plot the difference of the densities? 如何计算和绘制密度的差异?
You can do this by first using kde2d
to calculate the densities and then subtracting them from each other. 您可以首先使用
kde2d
计算密度,然后相互减去它们。 Then you do some data reshaping to get it into a form that can be fed to ggplot2
. 然后你做一些数据整形,把它变成一个可以输入
ggplot2
。
library(reshape2) # For melt function
# Calculate the common x and y range for geyser1 and geyser2
xrng = range(c(geyser1$duration, geyser2$duration))
yrng = range(c(geyser1$waiting, geyser2$waiting))
# Calculate the 2d density estimate over the common range
d1 = kde2d(geyser1$duration, geyser1$waiting, lims=c(xrng, yrng), n=200)
d2 = kde2d(geyser2$duration, geyser2$waiting, lims=c(xrng, yrng), n=200)
# Confirm that the grid points for each density estimate are identical
identical(d1$x, d2$x) # TRUE
identical(d1$y, d2$y) # TRUE
# Calculate the difference between the 2d density estimates
diff12 = d1
diff12$z = d2$z - d1$z
## Melt data into long format
# First, add row and column names (x and y grid values) to the z-value matrix
rownames(diff12$z) = diff12$x
colnames(diff12$z) = diff12$y
# Now melt it to long format
diff12.m = melt(diff12$z, id.var=rownames(diff12))
names(diff12.m) = c("Duration","Waiting","z")
# Plot difference between geyser2 and geyser1 density
ggplot(diff12.m, aes(Duration, Waiting, z=z, fill=z)) +
geom_tile() +
stat_contour(aes(colour=..level..), binwidth=0.001) +
scale_fill_gradient2(low="red",mid="white", high="blue", midpoint=0) +
scale_colour_gradient2(low=muted("red"), mid="white", high=muted("blue"), midpoint=0) +
coord_cartesian(xlim=xrng, ylim=yrng) +
guides(colour=FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.