R - Levenberg Marquardt非线性最小二乘拟合Heligman Pollard模型参数

Question

I am attempting to reproduce the solutions of paper by Kostakis. 我试图重现Kostakis的纸张解决方案。 In this paper an abridged mortality table is expanded to a complete life table using de Heligman-Pollard model. 在本文中，使用de Heligman-Pollard模型将删节死亡率表扩展为完整的生命表。 The model has 8 parameters which have to be fitted. 该模型有8个参数必须安装。 The author used a modified Gauss-Newton algorithm; 作者使用了改进的Gauss-Newton算法; this algorithm (E04FDF) is part of the NAG library of computer programs. 该算法（E04FDF）是NAG计算机程序库的一部分。 Should not Levenberg Marquardt yield the same set of parameters? Levenberg Marquardt不应该产生相同的参数集吗？ What is wrong with my code or application of the LM algorithm? 我的代码或LM算法的应用有什么问题？

library(minpack.lm)


## Heligman-Pollard is used to expand an abridged table.
## nonlinear least squares algorithm is used to fit the parameters on nqx observed over 5 year   intervals (5qx)
AGE <- c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70)
MORTALITY <- c(0.010384069, 0.001469140, 0.001309318, 0.003814265, 0.005378395, 0.005985625,     0.006741766, 0.009325056, 0.014149626, 0.021601755, 0.034271934, 0.053836246, 0.085287751, 0.136549522, 0.215953304)

## The start parameters for de Heligman-Pollard Formula (Converged set a=0.0005893,b=0.0043836,c=0.0828424,d=0.000706,e=9.927863,f=22.197312,g=0.00004948,h=1.10003)
## I modified a random parameter "a" in order to have a start values. The converged set is listed above. 
parStart <- list(a=0.0008893,b=0.0043836,c=0.0828424,d=0.000706,e=9.927863,f=22.197312,g=0.00004948,h=1.10003)

## The Heligman-Pollard Formula (HP8) = qx/px = ...8 parameter equation
HP8 <-function(parS,x)
ifelse(x==0, parS$a^((x+parS$b)^parS$c) + parS$g*parS$h^x, 
             parS$a^((x+parS$b)^parS$c) + parS$d*exp(-parS$e*(log(x/parS$f))^2) +
                 parS$g*parS$h^x)

## Define qx = HP8/(1+HP8)
qxPred <- function(parS,x) HP8(parS,x)/(1+HP8(parS,x))

## Calculate nqx predicted by HP8 model (nqxPred(parStart,x))
nqxPred <- function(parS,x)
(1 -(1-qxPred(parS,x)) * (1-qxPred(parS,x+1)) *
    (1-qxPred(parS,x+2)) * (1-qxPred(parS,x+3)) *
    (1-qxPred(parS,x+4))) 

##Define Residual Function, the relative squared distance is minimized  
ResidFun <- function(parS, Observed,x) (nqxPred(parS,x)/Observed-1)^2

## Applying the nls.lm algo. 
nls.out <- nls.lm(par=parStart, fn = ResidFun, Observed = MORTALITY, x = AGE,
                  control = nls.lm.control(nprint=1,
                                           ftol = .Machine$double.eps,
                                           ptol = .Machine$double.eps,
                                           maxfev=10000, maxiter = 500))

summary(nls.out)


## The author used a modified Gauss-Newton algorithm, this alogorithm (E04FDF) is part of the NAG library of computer programs
## Should not Levenberg Marquardt yield the same set of parameters

Answer 1

The bottom line here is that @Roland is absolutely right, this is a very ill-posed problem, and you shouldn't necessarily expect to get reliable answers. 这里的底线是@Roland是绝对正确的，这是一个非常不适合的问题，你不一定希望得到可靠的答案。 Below I've 我在下面

cleaned up the code in a few small ways (this is just aesthetic) 以一些小的方式清理代码（这只是审美）
changed the ResidFun to return residuals, not squared residuals. 将ResidFun更改为返回残差，而不是残差平方。 (The former is correct, but this doesn't make very much difference.) （前者是正确的，但这并没有太大区别。）
explored results from several different optimizers. 探索了几个不同优化器的结果。 It actually looks like the answer you're getting is better than the "converged parameters" you list above, which I'm assuming are the parameters from the original study (can you please provide a reference?). 它实际上看起来你得到的答案比你上面列出的“融合参数” 更好，我假设它是原始研究中的参数（你能提供参考吗？）。

Load package: 加载包：

library(minpack.lm)

Data, as a data frame: 数据，作为数据框：

d <- data.frame(
   AGE = seq(0,70,by=5),
   MORTALITY=c(0.010384069, 0.001469140, 0.001309318, 0.003814265,
               0.005378395, 0.005985625, 0.006741766, 0.009325056,
               0.014149626, 0.021601755, 0.034271934, 0.053836246,
               0.085287751, 0.136549522, 0.215953304))

First view of the data: 首先查看数据：

library(ggplot2)
(g1 <- ggplot(d,aes(AGE,MORTALITY))+geom_point())
g1+geom_smooth()  ## with loess fit

Parameter choices: 参数选择：

Presumably these are the parameters from the original paper ... 据推测这些是原始论文中的参数......

parConv <- c(a=0.0005893,b=0.0043836,c=0.0828424,
             d=0.000706,e=9.927863,f=22.197312,g=0.00004948,h=1.10003)

Perturbed parameters: 扰动参数：

parStart <- parConv
parStart["a"] <- parStart["a"]+3e-4

The formulae: 公式：

HP8 <-function(parS,x)
    with(as.list(parS),
         ifelse(x==0, a^((x+b)^c) + g*h^x, 
                a^((x+b)^c) + d*exp(-e*(log(x/f))^2) + g*h^x))
## Define qx = HP8/(1+HP8)
qxPred <- function(parS,x) {
    h <- HP8(parS,x)
    h/(1+h)
}
## Calculate nqx predicted by HP8 model (nqxPred(parStart,x))
nqxPred <- function(parS,x)
    (1 -(1-qxPred(parS,x)) * (1-qxPred(parS,x+1)) *
     (1-qxPred(parS,x+2)) * (1-qxPred(parS,x+3)) *
     (1-qxPred(parS,x+4))) 
##Define Residual Function, the relative squared distance is minimized  
ResidFun <- function(parS, Observed,x) (nqxPred(parS,x)/Observed-1)

nb this is changed slightly from the OP's version; 这是从OP的版本略有改变; nls.lm wants residuals, not squared residuals. nls.lm想要残差，而不是残差平方。

A sum-of-squares function for use with other optimizers: 与其他优化器一起使用的平方和函数：

ssqfun <- function(parS, Observed, x) {
   sum(ResidFun(parS, Observed, x)^2)
}

Applying nls.lm . 应用nls.lm (Not sure why ftol and ptol were lowered from sqrt(.Machine$double.eps) to .Machine$double.eps -- the former is generally a practical limit to precision ... （不确定为什么ftol和ptol从sqrt(.Machine$double.eps)降低到.Machine$double.eps - 前者通常是精确度的实际限制......

nls.out <- nls.lm(par=parStart, fn = ResidFun,
                  Observed = d$MORTALITY, x = d$AGE,
                  control = nls.lm.control(nprint=0,
                                           ftol = .Machine$double.eps,
                                           ptol = .Machine$double.eps,
                                           maxfev=10000, maxiter = 1000))

parNLS <- coef(nls.out)

pred0 <- nqxPred(as.list(parConv),d$AGE)
pred1 <- nqxPred(as.list(parNLS),d$AGE)

dPred <- with(d,rbind(data.frame(AGE,MORTALITY=pred0,w="conv"),
               data.frame(AGE,MORTALITY=pred1,w="nls")))

g1 + geom_line(data=dPred,aes(colour=w))

The lines are indistinguishable, but the parameters have some big differences: 线条难以区分，但参数有一些很大的差异：

round(cbind(parNLS,parConv),5)
##     parNLS  parConv
## a  1.00000  0.00059
## b 50.46708  0.00438
## c  3.56799  0.08284
## d  0.00072  0.00071
## e  6.05200  9.92786
## f 21.82347 22.19731
## g  0.00005  0.00005
## h  1.10026  1.10003

d,f,g,h are close, but a,b,c are orders of magnitude different and e is 50% different. d，f，g，h接近，但a，b，c是不同的数量级，e是50％不同。

Looking at the original equations, what's happening here is that a^((x+b)^c) is getting set to a constant, because a is approaching 1: once a is approximately 1, b and c are essentially irrelevant. 看看原始方程，这里发生的是a^((x+b)^c)被设置为常数，因为a接近1：一旦a大约为1， b和c基本上是无关的。

Let's check the correlation (we need a generalized inverse because the matrix is so strongly correlated): 让我们检查相关性（我们需要一个广义逆，因为矩阵是如此强相关）：

obj <- nls.out
vcov  <- with(obj,deviance/(length(fvec) - length(par)) * 
              MASS::ginv(hessian))

cmat <- round(cov2cor(vcov),1)
dimnames(cmat) <- list(letters[1:8],letters[1:8])

##      a    b    c    d    e    f    g    h
## a  1.0  0.0  0.0  0.0  0.0  0.0 -0.1  0.0
## b  0.0  1.0 -1.0  1.0 -1.0 -1.0 -0.4 -1.0
## c  0.0 -1.0  1.0 -1.0  1.0  1.0  0.4  1.0
## d  0.0  1.0 -1.0  1.0 -1.0 -1.0 -0.4 -1.0
## e  0.0 -1.0  1.0 -1.0  1.0  1.0  0.4  1.0
## f  0.0 -1.0  1.0 -1.0  1.0  1.0  0.4  1.0
## g -0.1 -0.4  0.4 -0.4  0.4  0.4  1.0  0.4
## h  0.0 -1.0  1.0 -1.0  1.0  1.0  0.4  1.0

This is not actually so useful -- it really just confirms that lots of the variables are strongly correlated ... 这实际上并没有那么有用 - 它确实只是证实了很多变量是强相关的......

library(optimx)
mvec <- c('Nelder-Mead','BFGS','CG','L-BFGS-B',
          'nlm','nlminb','spg','ucminf')
opt1 <- optimx(par=parStart, fn = ssqfun,
         Observed = d$MORTALITY, x = d$AGE,
               itnmax=5000,
               method=mvec,control=list(kkt=TRUE))
               ## control=list(all.methods=TRUE,kkt=TRUE)) ## Boom!

##         fvalues      method fns  grs itns conv KKT1 KKT2 xtimes
## 2 8.988466e+307        BFGS  NA NULL NULL 9999   NA   NA      0
## 3 8.988466e+307          CG  NA NULL NULL 9999   NA   NA      0
## 4 8.988466e+307    L-BFGS-B  NA NULL NULL 9999   NA   NA      0
## 5 8.988466e+307         nlm  NA   NA   NA 9999   NA   NA      0
## 7     0.3400858         spg   1   NA    1    3   NA   NA  0.064
## 8     0.3400858      ucminf   1    1 NULL    0   NA   NA  0.032
## 1    0.06099295 Nelder-Mead 501   NA NULL    1   NA   NA  0.252
## 6   0.009275733      nlminb 200 1204  145    1   NA   NA  0.708

This warns about bad scaling, and also finds a variety of different answers: only ucminf claims to have converged, but nlminb gets a better answer -- and the itnmax parameter seems to be ignored ... 这警告了不良的缩放，并且还发现了各种不同的答案：只有ucminf声称已经收敛，但是nlminb得到了更好的答案 - 并且itnmax参数似乎被忽略了......

opt2 <- nlminb(start=parStart, objective = ssqfun,
         Observed = d$MORTALITY, x = d$AGE,                   
               control= list(eval.max=5000,iter.max=5000))

parNLM <- opt2$par

Finishes, but with a false convergence warning ... 完成，但有一个虚假的收敛警告......

round(cbind(parNLS,parConv,parNLM),5)

##     parNLS  parConv   parNLM
## a  1.00000  0.00059  1.00000
## b 50.46708  0.00438 55.37270
## c  3.56799  0.08284  3.89162
## d  0.00072  0.00071  0.00072
## e  6.05200  9.92786  6.04416
## f 21.82347 22.19731 21.82292
## g  0.00005  0.00005  0.00005
## h  1.10026  1.10003  1.10026

sapply(list(parNLS,parConv,parNLM),
       ssqfun,Observed=d$MORTALITY,x=d$AGE)
## [1] 0.006346250 0.049972367 0.006315034

It looks like nlminb and minpack.lm are getting similar answers, and are actually doing better than the originally stated parameters (by quite a bit): 看起来nlminb和minpack.lm得到了类似的答案，并且实际上比最初声明的参数做得更好 （相当多）：

pred2 <- nqxPred(as.list(parNLM),d$AGE)

dPred <- with(d,rbind(dPred,
               data.frame(AGE,MORTALITY=pred2,w="nlminb")))

g1 + geom_line(data=dPred,aes(colour=w))
ggsave("cmpplot.png")

在此输入图像描述

ggplot(data=dPred,aes(x=AGE,y=MORTALITY-d$MORTALITY,colour=w))+
   geom_line()+geom_point(aes(shape=w),alpha=0.3)
ggsave("residplot.png")

在此输入图像描述

Other things one could try would be: 其他可以尝试的事情是：

appropriate scaling -- although a quick test of this doesn't seem to help that much 适当的缩放 - 虽然对此的快速测试似乎没有那么多帮助
provide analytical gradients 提供分析梯度
use AD Model Builder 使用AD Model Builder
use the slice function from bbmle to explore whether the old and new parameters seem to represent distinct minima, or whether the old parameters are just a false convergence ... 使用bbmle的slice函数来探索旧参数和新参数是否代表不同的最小值，或者旧参数是否只是一个错误的收敛...
get the KKT (Karsh-Kuhn-Tucker) criterion calculators from optimx or related packages working for similar checks 得到来自optimx或相关包的KKT（Karsh-Kuhn-Tucker）标准计算器，用于类似的检查

PS: the largest deviations (by far) are for the oldest age classes, which probably also have small samples. PS：最大的偏差（到目前为止）是最老的年龄组，可能也有小样本。 From a statistical point of view it would probably be worth doing a fit that weighted by the precision of the individual points ... 从统计学的角度来看，可能值得做一个由各个点的精度加权的拟合...

Answer 2

@BenBolker, fitting the parameters with the entire dataset (underlying qx) values. @BenBolker，使用整个数据集（基础qx）值拟合参数。 Still not able to reproduce parameters 仍然无法重现参数

library(minpack.lm)

library(ggplot2)

library(optimx)

getwd()

d <- data.frame(AGE = seq(0,74), MORTALITY=c(869,58,40,37,36,35,32,28,29,23,24,22,24,28,
                                           33,52,57,77,93,103,103,109,105,114,108,112,119,
                                           125,117,127,125,134,134,131,152,179,173,182,199,
                                           203,232,245,296,315,335,356,405,438,445,535,594,
                                           623,693,749,816,915,994,1128,1172,1294,1473,
                                           1544,1721,1967,2129,2331,2559,2901,3203,3470,
                                           3782,4348,4714,5245,5646))


d$MORTALITY <- d$MORTALITY/100000

ggplot(d,aes(AGE,MORTALITY))+geom_point()  

##Not allowed to post Images

g1 <- ggplot(d,aes(AGE,MORTALITY))+geom_point()

g1+geom_smooth()## with loess fit

Reported Parameters: 报告参数：

parConv <- c(a=0.0005893,b=0.0043836,c=0.0828424,d=0.000706,e=9.927863,f=22.197312,
             g=0.00004948,h=1.10003)

parStart <- parConv

parStart["a"] <- parStart["a"]+3e-4


## Define qx = HP8/(1+HP8)

HP8 <-function(parS,x)
with(as.list(parS),
ifelse(x==0, a^((x+b)^c) + g*h^x, a^((x+b)^c) + d*exp(-e*(log(x/f))^2) + g*h^x))



qxPred <- function(parS,x) {
  h <- HP8(parS,x)
  h/(1+h)
}



##Define Residual Function, the relative squared distance is minimized,
ResidFun <- function(parS, Observed,x) (qxPred(parS,x)/Observed-1)

ssqfun <- function(parS, Observed, x) {
  sum(ResidFun(parS, Observed, x)^2)
}

nls.out <- nls.lm(par=parStart, fn = ResidFun, Observed = d$MORTALITY, x = d$AGE, 
                  control = nls.lm.control(nprint=1, ftol = sqrt(.Machine$double.eps), 
                  ptol = sqrt(.Machine$double.eps), maxfev=1000, maxiter=1000))


parNLS <- coef(nls.out)

pred0 <- qxPred(as.list(parConv),d$AGE)
pred1 <- qxPred(as.list(parNLS),d$AGE)


#Binds Row wise the dataframes from pred0 and pred1
dPred <- with(d,rbind(data.frame(AGE,MORTALITY=pred0,w="conv"),
      data.frame(AGE,MORTALITY=pred1,w="nls")))


g1 + geom_line(data=dPred,aes(colour=w))

round(cbind(parNLS,parConv),7)

mvec <- c('Nelder-Mead','BFGS','CG','L-BFGS-B','nlm','nlminb','spg','ucminf')
opt1 <- optimx(par=parStart, fn = ssqfun,
    Observed = d$MORTALITY, x = d$AGE,
    itnmax=5000,
    method=mvec, control=list(all.methods=TRUE,kkt=TRUE,)
## control=list(all.methods=TRUE,kkt=TRUE)) ## Boom

get.result(opt1, attribute= c("fvalues","method", "grs", "itns",
           "conv", "KKT1", "KKT2", "xtimes"))

##       method       fvalues  grs itns conv KKT1 KKT2 xtimes
##5         nlm 8.988466e+307   NA   NA 9999   NA   NA      0
##4    L-BFGS-B 8.988466e+307 NULL NULL 9999   NA   NA      0
##2          CG 8.988466e+307 NULL NULL 9999   NA   NA   0.02
##1        BFGS 8.988466e+307 NULL NULL 9999   NA   NA      0
##3 Nelder-Mead     0.5673864   NA NULL    0   NA   NA   0.42
##6      nlminb     0.4127198  546   62    0   NA   NA   0.17


opt2 <- nlminb(start=parStart, objective = ssqfun,
    Observed = d$MORTALITY, x = d$AGE,
    control= list(eval.max=5000,iter.max=5000))

parNLM <- opt2$par

Check on parameters: 检查参数：

round(cbind(parNLS,parConv,parNLM),5)

##    parNLS  parConv   parNLM
##a  0.00058  0.00059  0.00058
##b  0.00369  0.00438  0.00369
##c  0.08065  0.08284  0.08065
##d  0.00070  0.00071  0.00070
##e  9.30948  9.92786  9.30970
##f 22.30769 22.19731 22.30769
##g  0.00005  0.00005  0.00005
##h  1.10084  1.10003  1.10084

SSE Review: SSE评论：

sapply(list(parNLS,parConv,parNLM),
  ssqfun,Observed=d$MORTALITY,x=d$AGE)  

 ##[1] 0.4127198 0.4169513 0.4127198

Not able to upload graphs but the code is here. 无法上传图表，但代码在这里。 Still appears that the parameters found in the article are not the best fit when the complete mortality data (not abridged or subset) is used 仍然看来，当使用完整的死亡率数据（未删节或子集）时，文章中找到的参数不是最佳拟合

##pred2 <- qxPred(as.list(parNLM),d$AGE)

##dPred <- with(d,rbind(dPred,
    data.frame(AGE,MORTALITY=pred2,w="nlminb")))

##g1 + geom_line(data=dPred,aes(colour=w))
ggplot(data=dPred,aes(x=AGE,y=MORTALITY-d$MORTALITY,colour=w))
        + geom_line()+geom_point(aes(shape=w),alpha=0.3)

R - Levenberg Marquardt非线性最小二乘拟合Heligman Pollard模型参数

问题描述

2 个解决方案

解决方案1
12 2013-07-28 15:52:42

解决方案2
0 2013-07-30 13:13:56

R - Levenberg Marquardt非线性最小二乘拟合Heligman Pollard模型参数

问题描述

2 个解决方案

解决方案1 12 2013-07-28 15:52:42

解决方案2 0 2013-07-30 13:13:56

解决方案1
12 2013-07-28 15:52:42

解决方案2
0 2013-07-30 13:13:56