简体   繁体   English

R:计算并绘制两个密度计数器之间的差异

[英]R: Calculate and plot difference between two density countours

I have two datasets with two continuous variables: duration and waiting . 我有两个连续变量的数据集: durationwaiting

library("MASS")
data(geyser)

geyser1 <- geyser[1:150,]

geyser2 <- geyser[151:299,]
geyser2$duration <- geyser2$duration - 1
geyser2$waiting <- geyser2$waiting - 20

For each dataset I output a 2D density plot 对于每个数据集,我输出2D密度图

ggplot(geyser1, aes(x = duration, y = waiting)) +
  xlim(0.5, 6) + ylim(40, 110) +
  stat_density2d(aes(alpha=..level..),
                 geom="polygon", bins = 10)

ggplot(geyser2, aes(x = duration, y = waiting)) +
  xlim(0.5, 6) + ylim(40, 110) +
  stat_density2d(aes(alpha=..level..),
                 geom="polygon", bins = 10)

I now want to produce a plot which indicates the regions where the two plot have the same density (white), negative differences (gradation from white to blue where geyser2 is denser than geyser1 ) and positive differences (gradation from white to red where geyser1 is denser than geyser2 ). 我现在想要制作一个图,表示两个图具有相同密度(白色)的区域,负差异(从白色到蓝色的渐变,其中geyser2geyser1更密集)和正差异(从白色到红色的渐变,其中geyser1是比geyser2更密集)。

How to compute and plot the difference of the densities? 如何计算和绘制密度的差异?

You can do this by first using kde2d to calculate the densities and then subtracting them from each other. 您可以首先使用kde2d计算密度,然后相互减去它们。 Then you do some data reshaping to get it into a form that can be fed to ggplot2 . 然后你做一些数据整形,把它变成一个可以输入ggplot2

library(reshape2) # For melt function

# Calculate the common x and y range for geyser1 and geyser2
xrng = range(c(geyser1$duration, geyser2$duration))
yrng = range(c(geyser1$waiting, geyser2$waiting))

# Calculate the 2d density estimate over the common range
d1 = kde2d(geyser1$duration, geyser1$waiting, lims=c(xrng, yrng), n=200)
d2 = kde2d(geyser2$duration, geyser2$waiting, lims=c(xrng, yrng), n=200)

# Confirm that the grid points for each density estimate are identical
identical(d1$x, d2$x) # TRUE
identical(d1$y, d2$y) # TRUE

# Calculate the difference between the 2d density estimates
diff12 = d1 
diff12$z = d2$z - d1$z

## Melt data into long format
# First, add row and column names (x and y grid values) to the z-value matrix
rownames(diff12$z) = diff12$x
colnames(diff12$z) = diff12$y

# Now melt it to long format
diff12.m = melt(diff12$z, id.var=rownames(diff12))
names(diff12.m) = c("Duration","Waiting","z")

# Plot difference between geyser2 and geyser1 density
ggplot(diff12.m, aes(Duration, Waiting, z=z, fill=z)) +
  geom_tile() +
  stat_contour(aes(colour=..level..), binwidth=0.001) +
  scale_fill_gradient2(low="red",mid="white", high="blue", midpoint=0) +
  scale_colour_gradient2(low=muted("red"), mid="white", high=muted("blue"), midpoint=0) +
  coord_cartesian(xlim=xrng, ylim=yrng) +
  guides(colour=FALSE)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM