简体   繁体   English

在 R 中使用 geom_density_2d() 时出错:`stat_density2d()` 中的计算失败:带宽必须严格为正

[英]Error using geom_density_2d() in R : Computation failed in `stat_density2d()`: bandwidths must be strictly positive

In a attempt to make a test 2d density plot with ggplot2, I used the code snippet:在尝试使用 ggplot2 制作测试 2d 密度图时,我使用了代码片段:

ggplot(df, aes(x = S1.x, y = S1.y)) + geom_point() + geom_density_2d()

and I got the error: "Computation failed in stat_density2d() : bandwidths must be strictly positive"我收到错误消息:“ stat_density2d()计算失败:带宽必须严格为正”

My dataframe looks like this:我的数据框如下所示:

> df

transcriptID S1.x      S1.y      S2.x       S2.y    
DQ459412     0.000000  0.000000  0.000000   0.000000
DQ459413     1.584963  2.358379  4.392317   3.085722    
DQ459415     0.000000  0.000000  0.000000   0.000000    
DQ459418     0.000000  0.000000  0.000000   0.000000    
DQ459419     0.000000  0.000000  4.000000   2.891544    
DQ459420     0.000000  0.000000  0.000000   0.000000      

Also, var(df[,"S1.x"]) > 0 and var(df[,"S1.y"]) > 0 .此外, var(df[,"S1.x"]) > 0var(df[,"S1.y"]) > 0

Fig 1 - 2d density plot with error图 1 - 有误差的二维密度图

However, I got a density plot without error by running:但是,通过运行,我得到了一个没有错误的密度图:

ggplot(df, aes(x = S2.x, y = S2.y)) + geom_point() + geom_density_2d()

Fig 2 - density plot without error图 2 - 没有错误的密度图

How do I address the error in Fig 1?我如何解决图 1 中的错误?

So the real problem is that the S1.x and S1.y values only have one non-zero value in their columns.所以真正的问题是S1.xS1.y值在它们的列中只有一个非零值。 And it turns out that geom_density_2d can't really estimate a density with only a value or two.事实证明, geom_density_2d不能真正估计只有一两个值的密度。 But read on...但是请继续阅读...

Update:更新:

This question has been asked before, and the answers are usually that you need to have non-zero variance in your data columns.这个问题以前有人问过,答案通常是你的数据列中需要有非零方差。 But you do have non-zero variance , so why isn't it working?但是您确实有非零方差,那么为什么它不起作用呢?

  • Looking at the internals of geom_density_2d we see that it uses the MASS::kde2d package function to calculate the distribution.查看geom_density_2d的内部结构,我们看到它使用MASS::kde2d包函数来计算分布。
  • Looking at kde2d we see that it uses MASS::bandwidth.nrd(df$x) to get an estimate of the bandwidth.查看kde2d我们看到它使用MASS::bandwidth.nrd(df$x)来估计带宽。
  • Looking at the help (which has the code) for bandwidth.nrd we see it uses a rule of thumb that gets the quantile of the distribution, and subtracts the 2nd quantile from the 1st quantile to get a bandwidth estimate.查看bandwidth.nrd的帮助(其中包含代码),我们看到它使用了一个经验法则来获取分布的quantile ,并从第一个分位数中减去第二个分位数以获得带宽估计值。
  • Doing a quantile on your original data we see that the quantiles of the data were zero.对原始数据进行分位数分析,我们看到数据的分位数为零。
  • And running MASS::kde2d on your original data with that bandwidth.nrd estimate of the bandwidth gives you the same error:并使用该bandwidth.nrd在原始数据上运行MASS::kde2dbandwidth.nrd MASS::kde2d估计会给您相同的错误:
 library(MASS) nn <- c("DQ459412","DQ459413","DQ459415","DQ459418","DQ459419","DQ459420") s1x <- c(0,1.584963,0,0,0,0) s1y <- c(0,2.358379,0,0,0,0) s2x <- c(0,4.392317,0,0,4,0) s2y <- c(0,3.085722,0,0,2.891544,0) df <- data.frame(transcriptID=nn,S1.x=s1x,S1.y=s1y,S2.x=s2x,S2.y=s2y)
> quantile(df$s1x)
      0%      25%      50%      75%     100% 
0.000000 0.000000 0.000000 0.000000 1.584963 
> quantile(df$s1y)
      0%      25%      50%      75%     100% 
0.000000 0.000000 0.000000 0.000000 2.358379 
 h <- c(MASS::bandwidth.nrd(df$x), MASS::bandwidth.nrd(df$y)) dens <- MASS::kde2d(df$s1x, df$s1y, h = h, n = n, lims = c(0,1,0,1))

Error in MASS::kde2d(df$s1x, df$s1y, h = h, n = n, lims = c(0, 1, 0, 1)) : bandwidths must be strictly positive MASS::kde2d(df$s1x, df$s1y, h = h, n = n, lims = c(0, 1, 0, 1)) 中的错误:带宽必须严格为正

So the real criteria for using geom_density_2D is that both the x- and the y-data needs to have a non-zero gap between their 1st and 2nd quantiles.因此,使用geom_density_2D的真正标准是 x 和 y 数据都需要在它们的第 1 和第 2 分位数之间具有非零间隙。

Now to fix it, if I make a small modification - replacing one of the zeros with 0.1, like this:现在修复它,如果我做一个小的修改 - 用 0.1 替换其中一个零,如下所示:

nn <- c("DQ459412","DQ459413","DQ459415","DQ459418","DQ459419","DQ459420")
s1x <- c(0,1.584963,0,0,0.1,0)
s1y <- c(0,2.358379,0,0,0.1,0) 
s2x <- c(0,4.392317,0,0,4,0)
s2y <- c(0,3.085722,0,0,2.891544,0) 
df <- data.frame(transcriptID=nn,S1.x=s1x,S1.y=s1y,S2.x=s2x,S2.y=s2y)
print(df)

yielding:产生:

  transcriptID     S1.x     S1.y     S2.x     S2.y
1     DQ459412 0.000000 0.000000 0.000000 0.000000
2     DQ459413 1.584963 2.358379 4.392317 3.085722
3     DQ459415 0.000000 0.000000 0.000000 0.000000
4     DQ459418 0.000000 0.000000 0.000000 0.000000
5     DQ459419 0.100000 0.100000 4.000000 2.891544
6     DQ459420 0.000000 0.000000 0.000000 0.000000

Then I get this plot instead of your error.然后我得到这个情节而不是你的错误。

在此处输入图片说明 You can let that 0.1 value approach zero, eventually it will not be able to calculate a distribution anymore and you will get your error again.您可以让0.1值接近零,最终它将无法再计算分布,您将再次出错。

One general way to deal with this situation is to add a very small quantity of noise to your data, kind of simulating the fact that any meaningful calculation based on a real measurement from a continuous distribution should be impervious to that small quantity of noise.处理这种情况的一种通用方法是向您的数据添加非常少量的噪声,模拟这样一个事实,即基于连续分布的真实测量的任何有意义的计算都应该不受该少量噪声的影响。

Hope that helps.希望有帮助。

The answer of @Mike Wise is pretty solid indeed and my answer is somewhat complementary to it. @Mike Wise 的答案确实非常可靠,我的答案在某种程度上是对它的补充。 Actually, the bandwidth.nrd function computes the difference between the 3rd and the 1st quantile not the 2nd and 1st (code from the function):实际上, bandwidth.nrd函数计算的是第 3 个和第1 个分位数之间的差异,而不是第 2 个和第1 个分位数(来自函数的代码):

r <- quantile(distances, c(0.25, 0.75))

Instead of adding random noise to your data, I would suggest to precompute the bandwidths yourself and pass them to the function, testing for non-zero values like so:我建议不要向数据添加随机噪声,而是建议自己预先计算带宽并将它们传递给函数,测试非零值,如下所示:

kde2d(df$s1x, df$s1y, 
      h = c(ifelse(bandwidth.nrd(df$s1x) == 0, 0.1, bandwidth.nrd(df$s1x)),
            ifelse(bandwidth.nrd(df$s1y) == 0, 0.1, bandwidth.nrd(df$s1y))))

Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM