简体   繁体   English

从R中的绘制密度函数(核密度估计)中寻找最大值

[英]Finding The Max Value from Plotted Density Function (Kernel Density Estimation) in R

I have some data I that I'm assuming comes from a distribution and I'm trying to estimate that distribution. 我有一些数据,我假设我来自一个分布,我正在试图估计这个分布。 Right now I'm using the package KernSmooth in R with a Gaussian kernel and am using the package's dpik() function to automatically select my bandwidth. 现在我正在使用R中的包KernSmooth和高斯内核,并使用包的dpik()函数自动选择我的带宽。 (I assume it uses AMISE or the sort, please let me know if there is a better auto-bandwidth selection process) What I'm interested in, though, is finding the x-value that corresponds with the highest peak in the distribution...This seems like a very simple thing to me and something I put off as trivial earlier on but to my frustration, I'm hitting some snags. (我假设它使用AMISE或排序,请让我知道是否有更好的自动带宽选择过程)但我感兴趣的是找到与分布中最高峰相对应的x值。 ..这对我来说似乎是一件非常简单的事情,而且我早些时候琐碎的事情,但令我沮丧的是,我遇到了一些障碍。 The bkde() function in KernSmooth passes back a set of (x,y) coordinates which map out the distribution the algorithm has estimated. bkde()函数传回一组(x,y)坐标,这些坐标映射出算法估计的分布。 I know I could simply do a linear search through the data to find the max y-value and could simply grab the corresponding x-value but, as I am writing a function which may be called frequently in an automated process, I feel it is inefficient. 我知道我可以简单地对数据进行线性搜索以找到最大y值,并且可以简单地获取相应的x值,但是,因为我正在编写一个可以在自动化过程中频繁调用的函数,我觉得它是效率低下。 Especially inefficient since bkde() gives back a lot of values. 特别低效,因为bkde()会返回很多值。

My other idea was to attempt to fit a curve to it and take the derivative and set it equal to zero but that sounds like it may be inefficient as well. 我的另一个想法是尝试将曲线拟合到它并采用导数并将其设置为等于零,但听起来它可能也是低效的。 Maybe density() would be a better function to use here? 也许density()可以在这里使用更好的功能?

Please let me know if there is any efficient way for this...I actually plan to do a little bit of inference on the distributions I find. 如果有任何有效的方法,请告诉我......我实际上打算对我找到的发行版进行一些推断。 Such as finding the cutoff points to chop off a certain percentage of the tail on either side (ie confidence intervals) and finding the expected value. 例如找到切断点以切断任一侧尾部的一定百分比(即置信区间)并找到预期值。 My vague plan now is to use some monte carlo techniques or attempt to draw from the distribution to get an idea for areas with bootstrapping techniques. 我现在模糊的计划是使用一些蒙特卡罗技术或尝试从分布中抽取来获得有关自举技术的区域的想法。 Any help on any methods to do any of these would be greatly appreciated. 任何有关这些方法的任何帮助都将不胜感激。

Using: 使用:

> require(KernSmooth)
Loading required package: KernSmooth
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009
> mod <- bkde(faithful$waiting)
> str(mod)
List of 2
 $ x: num [1:401] 22.7 23 23.2 23.4 23.7 ...
 $ y: num [1:401] 3.46e-08 1.17e-07 1.40e-07 1.68e-07 2.00e-07 ...

is this not efficient enough? 这不够有效吗?

> which(mod$y == max(mod$y))
[1] 245

density() does something similar, but it returns 512 values of the density evaluate at 512 regular intervals of x . density()执行类似的操作,但它返回512个密度值,以512个x常规间隔计算。

In both functions the number of points returned can be controlled. 在这两个函数中,可以控制返回的点数。 See argument gridsize in bkde() and n in density() . 请参阅bkde()参数gridsizedensity() n Of course, the precision of the approach does depend on the density of points at which the KDE is estimated so you won;t want to set this too low. 当然,该方法的精确度取决于估计KDE的点的密度,因此您赢了;我想将此设置得太低。

My gut tells me you may spend an awful lot more time thinking up and implementing a more efficient approach than you would spend just going with the above simple solution. 我的直觉告诉我,你可能会花费更多的时间来思考并实施一种比使用上述简单解决方案更有效的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM