目标函数中嵌套的二等分方法的分析梯度

Question

I'm attempting to fit parameters to a data set using optim() in R. The objective function requires iterative root-solving for equation G so that the predicted values p brings the values for G (nested within the objective function) to 0 (or as close as 0 to possible; I use 50 iterations of the bisection method for stability). 我正在尝试使用R中的optim()将参数拟合到数据集。目标函数需要对方程G进行迭代根求解，以便预测值p将G （嵌套在目标函数中）的值变为0（或尽可能接近0；为了稳定起见，我使用了对分方法的50次迭代）。

Here is the problem: I would really prefer to include an analytical gradient for optim() , but I suspect it isn't possible for an iterated function. 这就是问题所在：我真的更愿意为optim()包括一个分析梯度，但是我怀疑对于迭代函数是不可能的。 However, before I give up on the analytical gradient, I wanted to run this problem by everyone here and see if there might be a solution I'm overlooking. 但是，在放弃分析梯度之前，我想让这里的每个人都遇到这个问题，看看是否有我忽略的解决方案。 Any thoughts? 有什么想法吗？

Note: before settling on the bisection method, I tried other root-solving methods, but all non-bracketing methods (Newton, etc.) seem to be unstable. 注意：在解决二等分方法之前，我尝试了其他根求解方法，但是所有非括号方法（牛顿等）似乎都是不稳定的。

Below is a reproducible example of the problem. 下面是该问题的可复制示例。 With the provided data set and the starting values for optim() , the algorithm converges just fine without an analytical gradient, but it doesn't perform so well for other sets of data and starting values. 使用提供的数据集和optim()的起始值，该算法可以很好地收敛而无需解析梯度，但是对于其他数据集和起始值来说效果不佳。

#the data set includes two input variables (x1 and x2)
#the response values are k successes out of n trials
x1=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  1, 1, 1, 1, 1.5, 1.5, 1.5, 1.5, 1.75, 1.75, 1.75, 1.75, 2, 2, 
  2, 2, 2.25, 2.25, 2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75, 
  2.75, 2.75, 3, 3, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.5, 3.5, 3.5, 
  3.5, 3.75, 3.75, 3.75, 3.75, 4, 4, 4, 4, 4.25, 4.25, 4.25, 4.25, 
  4.5, 4.5, 4.5, 4.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1.5, 1.5, 1.5, 1.5, 
  1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 
  1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.75, 1.75, 
  1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 
  1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 
  1.75, 1.75, 1.75, 1.75, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2.25, 2.25, 2.25, 
  2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 
  2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 
  2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 
  2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 
  2.5, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 
  2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 
  2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 
  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
  3, 3, 3, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 
  3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 
  3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.5, 3.5, 
  3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 
  3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 
  3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 
  3.75, 3.75, 3.75, 3.75, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
  4, 4, 4, 4, 4, 4, 4, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 
  4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 
  4.25, 4.25, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 
  4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5)
x2=c(0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 
  0.15, 0.2, 0.2, 0.2, 0.2, 0.233, 0.233, 0.233, 0.267, 0.267, 
  0.267, 0.267, 0.3, 0.3, 0.3, 0.3, 0.333, 0.333, 0.333, 0.333, 
  0.367, 0.367, 0.367, 0.367, 0.4, 0.4, 0.4, 0.4, 0.433, 0.433, 
  0.433, 0.433, 0.467, 0.467, 0.467, 0.5, 0.5, 0.5, 0.5, 0.55, 
  0.55, 0.55, 0.55, 0.6, 0.6, 0.6, 0.6, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 
  0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 
  0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 
  0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 
  0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 
  0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 
  0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 
  0.4, 0.4, 0.4, 0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 
  0.55, 0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 
  0.3, 0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 
  0.433, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 
  0.1, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 
  0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.467, 0.467, 0.467, 0.467, 
  0.55, 0.55, 0.55, 0.55, 0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 
  0.233, 0.233, 0.3, 0.3, 0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 
  0.433, 0.433, 0.433, 0.433, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 
  0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 
  0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.467, 
  0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 0.15, 0.15, 
  0.15, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 0.3, 0.367, 0.367, 
  0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 0.5, 0.5, 0.5, 0.6, 
  0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 
  0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 
  0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 
  0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 
  0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 
  0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 
  0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 
  0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 
  0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 
  0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 
  0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.15, 
  0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 
  0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433)
k=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 1, 3, 
    3, 3, 3, 3, 3, 4, 2, 5, 3, 4, 7, 8, 5, 4, 5, 5, 4, 5, 5, 5, 6, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 2, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 3, 2, 4, 1, 2, 
    3, 4, 2, 2, 4, 4, 3, 1, 2, 0, 3, 4, 5, 5, 0, 0, 0, 0, 0, 0, 1, 
    0, 0, 1, 1, 2, 1, 2, 2, 0, 3, 1, 0, 2, 4, 6, 5, 5, 4, 5, 5, 5, 
    1, 0, 0, 0, 2, 1, 0, 1, 3, 2, 1, 1, 3, 4, 3, 4, 5, 5, 5, 5, 8, 
    6, 7, 6, 6, 5, 7, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 2, 1, 1, 3, 3, 
    2, 1, 3, 6, 2, 5, 3, 3, 5, 6, 5, 5, 5, 1, 0, 1, 1, 2, 1, 1, 1, 
    3, 4, 2, 5, 5, 3, 4, 4, 6, 4, 6, 5, 6, 5, 5, 5, 5, 4, 5, 5, 0, 
    0, 0, 0, 0, 2, 0, 2, 3, 3, 3, 2, 3, 3, 1, 4, 4, 4, 4, 3, 5, 6, 
    5, 5, 5, 5, 5, 1, 4, 1, 2, 2, 3, 4, 2, 5, 5, 5, 5, 5, 4, 5, 7, 
    6, 7, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 0, 1, 0, 0, 3, 2, 3, 3, 1, 
    2, 2, 2, 4, 2, 3, 2, 5, 5, 5, 5, 4, 6, 5, 6, 5, 5, 6, 5, 3, 5, 
    2, 4, 5, 3, 5, 5, 6, 4, 4, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 5, 
    5, 5, 5, 0, 0, 2, 0, 3, 2, 3, 2, 3, 4, 3, 4, 5, 5, 5, 5, 6, 4, 
    6, 4, 5, 7, 5, 5, 5, 6, 5, 5, 2, 3, 4, 4, 4, 4, 5, 5, 5, 6, 5, 
    5, 5, 5, 5, 4, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 1, 0, 2, 0, 
    3, 5, 2, 2, 4, 5, 4, 5, 6, 6, 4, 5, 4, 5, 4, 5, 5, 5, 5, 5, 5, 
    6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 4, 1, 4, 4, 4, 4, 4, 3, 6, 5, 
    4, 3, 5, 4, 5, 6, 6, 5, 6, 5, 4, 5, 5, 5, 6, 5, 5, 5, 11, 5, 
    12, 5, 5, 5, 5, 4, 5, 5, 5)
n=c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 
  6, 5, 6, 5, 5, 5, 5, 7, 5, 6, 8, 8, 6, 5, 6, 5, 5, 5, 5, 5, 6, 
  5, 5, 5, 5, 7, 11, 8, 7, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 
  4, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 5, 5, 4, 5, 5, 5, 6, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 6, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 7, 6, 7, 6, 5, 5, 5, 5, 
  5, 5, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 
  8, 6, 7, 6, 6, 5, 7, 5, 5, 5, 5, 6, 5, 5, 5, 7, 7, 6, 5, 6, 5, 
  5, 5, 5, 6, 6, 4, 6, 6, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 7, 5, 5, 
  4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 4, 6, 5, 6, 5, 5, 5, 5, 4, 5, 5, 
  5, 5, 6, 6, 5, 6, 5, 4, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 
  6, 5, 5, 5, 5, 5, 5, 6, 5, 6, 7, 4, 6, 5, 5, 5, 5, 5, 5, 4, 5, 
  7, 6, 7, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 
  5, 5, 5, 5, 5, 4, 5, 6, 5, 5, 5, 5, 5, 7, 5, 6, 5, 5, 6, 5, 5, 
  5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 
  5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 
  5, 6, 5, 6, 7, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 5, 6, 5, 5, 5, 6, 
  5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 4, 5, 5, 5, 5, 5, 5, 5, 7, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 
  5, 5, 5, 5, 5, 5, 6, 6, 5, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 11, 
  5, 12, 5, 5, 5, 5, 4, 5, 5, 5)

   #low_high contains the lower and upper bounds for the bisection method
    low_high=vector("list",2)
    low_high[["low"]]=rep(0,length(x1))
    low_high[["high"]]=rep(1,length(x1))
    low_high_list=rep(list(low_high),50)

    ll=function(theta)
    {
      names(theta)=c("b1","m1","b2","m2")
      b1=theta[["b1"]]
      m1=theta[["m1"]]
      b2=theta[["b2"]]
      m2=theta[["m2"]]

      #bisection function is used to find y which makes G=0
      bisection_function=function(prv,nxt)
      {
        low_high=prv
        #G and y are both vectors of the length of the data set (in this example, 469)
        y=(low_high[["low"]]+low_high[["high"]])/2
        G=-1+(x1/((log(-y/(y-1))-b1)/m1))+(x2/((log(-y/(y-1))-b2)/m2))
        low_high[["low"]][G>0]=y[G>0]
        low_high[["high"]][G<0]=y[G<0]
        return(low_high)
      }

      #Reduce is the fastest method I've found so far
      #(in other words, if there is a better way, I'm certainly open to suggestions!)
      low_high=Reduce(bisection_function,low_high_list)
      p=(low_high[["low"]]+low_high[["high"]])/2

      #sum of log likelihood for binomial distribution
      loglik=sum(log((gamma(1+n)/(gamma(1+k)*(gamma(1+n-k))))*((p^k)*((1-p)^(n-k)))))
      return(loglik)
    }

    theta.start=c(b1=-10,m1=10,b2=-10,m2=10)
    mle=optim(theta.start,ll,control=list(fnscale=-1),hessian=TRUE)

Thanks!! 谢谢！！

Answer 1

Using Vincent's suggestions, I was able to supply an analytic gradient via implicit differentiation. 使用文森特的建议，我能够通过隐式微分提供分析梯度。 In case anyone else has a similar problem, I have included reproducible code below (to be added after the code included in the question). 万一其他人有类似的问题，我在下面添加了可复制的代码（在问题中包含的代码之后添加）。

Gexpression=parse(text="-1+(x1/((log(-p/(p-1))-b1)/m1))+(x2/((log(-p/(p-1))-b2)/m2))")

nested=function(theta)
{
  names(theta)=c("b1","m1","b2","m2")
  b1=theta[["b1"]]
  m1=theta[["m1"]]
  b2=theta[["b2"]]
  m2=theta[["m2"]]

  #bisection function is used to find y which makes G=0
  bisection_function=function(prv,nxt)
  {
    low_high=prv
    #G and y are both vectors of the length of the data set (in this example, 469)
    y=(low_high[["low"]]+low_high[["high"]])/2
    G=-1+(x1/((log(-y/(y-1))-b1)/m1))+(x2/((log(-y/(y-1))-b2)/m2))
    low_high[["low"]][G>0]=y[G>0]
    low_high[["high"]][G<0]=y[G<0]
    return(low_high)
  }

  low_high=Reduce(bisection_function,low_high_list)
  p=(low_high[["low"]]+low_high[["high"]])/2
  return(p)
}

gr=function(theta)
{
  names(theta)=c("b1","m1","b2","m2")
  b1=theta[["b1"]]
  m1=theta[["m1"]]
  b2=theta[["b2"]]
  m2=theta[["m2"]]
  p=nested(theta)

  # dll is the derivative of the loglik function, which takes the partial derivative
  #   of any parameter
  dll=function(d_any) (((k / p) * d_any) - (((n - k) / (1 - p))*d_any))

  #fd_any takes the partial derivative of the with respect to any parameter
  fd_any=function(any) eval(parse(text=paste("-((",as.character(list(D(Gexpression,any))),")/(",as.character(list(D(Gexpression,'p'))),"))",sep="")))

  DLb1=dll(fd_any("b1"))
  DLb2=dll(fd_any("b2"))
  DLm1=dll(fd_any("m1"))
  DLm2=dll(fd_any("m2"))

  DLb1[is.na(DLb1)]=0
  DLb2[is.na(DLb2)]=0
  DLm1[is.na(DLm1)]=0
  DLm2[is.na(DLm2)]=0

  colSums(cbind(b1=DLb1,m1=DLm1,b2=DLb2,m2=DLm2))
}

hs=function(theta)
{
  names(theta)=c("b1","m1","b2","m2")
  b1=theta[["b1"]]
  m1=theta[["m1"]]
  b2=theta[["b2"]]
  m2=theta[["m2"]]
  p=nested(theta)
  fd_any_fun=function(any) paste("(-((",as.character(list(D(Gexpression,any))),")/(",as.character(list(D(Gexpression,'p'))),")))",sep="")
  dll_fun=function(d_any_fun) paste("((k / p) * (",d_any_fun,")) - (((n - k) / (1 - p))*(",d_any_fun,"))",sep="")

  hb1b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"b1")))
  hb1m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"m1")))
  hb1b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"b2")))
  hb1m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"m2")))

  hb1b1[is.na(hb1b1)]=0
  hb1m1[is.na(hb1m1)]=0
  hb1b2[is.na(hb1b2)]=0
  hb1m2[is.na(hb1m2)]=0

  hb1b1=sum(hb1b1)
  hb1m1=sum(hb1m1)
  hb1b2=sum(hb1b2)
  hb1m2=sum(hb1m2)

  h1=c(hb1b1,hb1m1,hb1b2,hb1m2)

  hm1b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"b1")))
  hm1m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"m1")))
  hm1b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"b2")))
  hm1m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"m2")))

  hm1b1[is.na(hm1b1)]=0
  hm1m1[is.na(hm1m1)]=0
  hm1b2[is.na(hm1b2)]=0
  hm1m2[is.na(hm1m2)]=0

  hm1b1=sum(hm1b1)
  hm1m1=sum(hm1m1)
  hm1b2=sum(hm1b2)
  hm1m2=sum(hm1m2)

  h2=c(hm1b1,hm1m1,hm1b2,hm1m2)

  hb2b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"b1")))
  hb2m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"m1")))
  hb2b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"b2")))
  hb2m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"m2")))

  hb2b1[is.na(hb2b1)]=0
  hb2m1[is.na(hb2m1)]=0
  hb2b2[is.na(hb2b2)]=0
  hb2m2[is.na(hb2m2)]=0

  hb2b1=sum(hb2b1)
  hb2m1=sum(hb2m1)
  hb2b2=sum(hb2b2)
  hb2m2=sum(hb2m2)

  h3=c(hb2b1,hb2m1,hb2b2,hb2m2)

  hm2b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"b1")))
  hm2m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"m1")))
  hm2b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"b2")))
  hm2m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"m2")))

  hm2b1[is.na(hm2b1)]=0
  hm2m1[is.na(hm2m1)]=0
  hm2b2[is.na(hm2b2)]=0
  hm2m2[is.na(hm2m2)]=0

  hm2b1=sum(hm2b1)
  hm2m1=sum(hm2m1)
  hm2b2=sum(hm2b2)
  hm2m2=sum(hm2m2)

  h4=c(hm2b1,hm2m1,hm2b2,hm2m2)

  h=rbind(h1,h2,h3,h4)
  return(h)
}

The gradient seems to work fine. 渐变效果很好。 For some reason, the estimated Hessian matrix from optim() is different than the gradient calculated in hs() . 由于某些原因，从optim()估计的Hessian矩阵与hs()计算的梯度不同。 The resulting standard errors are of the same order of magnitude, at least: 产生的标准误差至少在相同数量级上：

# Standard errors from optim Hessian
sqrt(abs(diag(solve(mle$hessian))))
# Standard errors from analytic Hessian
sqrt(abs(diag(solve(hs(mle$par)))))

目标函数中嵌套的二等分方法的分析梯度

问题描述

1 个解决方案

解决方案1
1 2013-09-29 23:34:38

目标函数中嵌套的二等分方法的分析梯度

问题描述

1 个解决方案

解决方案1 1 2013-09-29 23:34:38

解决方案1
1 2013-09-29 23:34:38