简体   繁体   English

如何使正常的累积分布函数适合数据

[英]How to fit a normal cumulative distribution function to data

I have generated some data which is effectively a cumulative distribution, the code below gives an example of X and Y from my data: 我生成了一些实际上是累积分布的数据,下面的代码给出了我的数据中X和Y的示例:

X<- c(0.09787761, 0.10745590, 0.11815422, 0.15503521, 0.16887488, 0.18361325, 0.22166727,
0.23526786, 0.24198808, 0.25432602, 0.26387961, 0.27364063, 0.34864672, 0.37734113,
0.39230736, 0.40699061, 0.41063824, 0.42497043, 0.44176913, 0.46076456, 0.47229330,
0.53134509, 0.56903577, 0.58308938, 0.58417653, 0.60061901, 0.60483849, 0.61847521,
0.62735245, 0.64337353, 0.65783302, 0.67232004, 0.68884473, 0.78846000, 0.82793293,
0.82963446, 0.84392010, 0.87090024, 0.88384044, 0.89543314, 0.93899033, 0.94781219,
1.12390279, 1.18756693, 1.25057774)

Y<- c(0.0090, 0.0210, 0.0300, 0.0420, 0.0580, 0.0700, 0.0925, 0.1015, 0.1315, 0.1435,
0.1660, 0.1750, 0.2050, 0.2450, 0.2630, 0.2930, 0.3110, 0.3350, 0.3590, 0.3770, 0.3950,
0.4175, 0.4475, 0.4715, 0.4955, 0.5180, 0.5405, 0.5725, 0.6045, 0.6345, 0.6585, 0.6825,
0.7050, 0.7230, 0.7470, 0.7650, 0.7950, 0.8130, 0.8370, 0.8770, 0.8950, 0.9250, 0.9475,
0.9775, 1.0000)

plot(X,Y)

I would like to obtain the median, mean and some quantile information (say for example 5%, 95%) from this data. 我想从这些数据中获得中值,平均值和一些分位数信息(例如5%,95%)。 The way I was thinking of doing this was to fit a defined distribution to it and then integrate to get my quantiles, mean and median values. 我想这样做的方法是为它定义一个定义的分布,然后进行积分以得到我的分位数,平均值和中值。

The question is how to fit the most appropriate cumulative distribution function to this data (I expect this may well be the Normal Cumulative Distribution Function). 问题是如何将最合适的累积分布函数拟合到这个数据(我希望这可能是正常的累积分布函数)。

I have seen lots of ways to fit a PDF but I can't find anything on fitting a CDF. 我已经看到很多方法来适应PDF但我找不到任何适合CDF的东西。

(I realise this may seem a basic question to many of you but it has me struggling!!) (我意识到这对许多人来说似乎是一个基本问题,但它让我挣扎!!)

Thanks in advance 提前致谢

Perhaps you could use nlm to find parameters that minimize the squared differences from your observed Y values and the expected for a normal distribution. 也许您可以使用nlm查找参数,以最小化与观察到的Y值的平方差异和正态分布的预期差异。 Here an example using your data 这是一个使用您的数据的示例

fn <- function(x) {
   mu <- x[1];
   sigma <- exp(x[2])
   sum((Y-pnorm(X,mu,sigma))^2)
}
est <- nlm(fn, c(1,1))$estimate

plot(X,Y)
curve(pnorm(x, est[1], exp(est[2])), add=T)

Unfortunately I don't know an easy with with this method to constrain sigma>0 without doing the exp transformation on the variable. 不幸的是,我不知道用这种方法很容易约束sigma> 0而不对变量进行exp转换。 But the fit seems reasonable 但合适似乎是合理的

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM