简体   繁体   中英

R: Calculate and plot difference between two density countours

I have two datasets with two continuous variables: duration and waiting .

library("MASS")
data(geyser)

geyser1 <- geyser[1:150,]

geyser2 <- geyser[151:299,]
geyser2$duration <- geyser2$duration - 1
geyser2$waiting <- geyser2$waiting - 20

For each dataset I output a 2D density plot

ggplot(geyser1, aes(x = duration, y = waiting)) +
  xlim(0.5, 6) + ylim(40, 110) +
  stat_density2d(aes(alpha=..level..),
                 geom="polygon", bins = 10)

ggplot(geyser2, aes(x = duration, y = waiting)) +
  xlim(0.5, 6) + ylim(40, 110) +
  stat_density2d(aes(alpha=..level..),
                 geom="polygon", bins = 10)

I now want to produce a plot which indicates the regions where the two plot have the same density (white), negative differences (gradation from white to blue where geyser2 is denser than geyser1 ) and positive differences (gradation from white to red where geyser1 is denser than geyser2 ).

How to compute and plot the difference of the densities?

You can do this by first using kde2d to calculate the densities and then subtracting them from each other. Then you do some data reshaping to get it into a form that can be fed to ggplot2 .

library(reshape2) # For melt function

# Calculate the common x and y range for geyser1 and geyser2
xrng = range(c(geyser1$duration, geyser2$duration))
yrng = range(c(geyser1$waiting, geyser2$waiting))

# Calculate the 2d density estimate over the common range
d1 = kde2d(geyser1$duration, geyser1$waiting, lims=c(xrng, yrng), n=200)
d2 = kde2d(geyser2$duration, geyser2$waiting, lims=c(xrng, yrng), n=200)

# Confirm that the grid points for each density estimate are identical
identical(d1$x, d2$x) # TRUE
identical(d1$y, d2$y) # TRUE

# Calculate the difference between the 2d density estimates
diff12 = d1 
diff12$z = d2$z - d1$z

## Melt data into long format
# First, add row and column names (x and y grid values) to the z-value matrix
rownames(diff12$z) = diff12$x
colnames(diff12$z) = diff12$y

# Now melt it to long format
diff12.m = melt(diff12$z, id.var=rownames(diff12))
names(diff12.m) = c("Duration","Waiting","z")

# Plot difference between geyser2 and geyser1 density
ggplot(diff12.m, aes(Duration, Waiting, z=z, fill=z)) +
  geom_tile() +
  stat_contour(aes(colour=..level..), binwidth=0.001) +
  scale_fill_gradient2(low="red",mid="white", high="blue", midpoint=0) +
  scale_colour_gradient2(low=muted("red"), mid="white", high=muted("blue"), midpoint=0) +
  coord_cartesian(xlim=xrng, ylim=yrng) +
  guides(colour=FALSE)

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM