为什么scipy.optimize.minimize（默认）报告成功而不移动Skyfield？

Question

scipy.optimize.minimize using default method is returning the initial value as the result, without any error or warning messages. scipy.optimize.minimize使用默认方法返回初始值作为结果，没有任何错误或警告消息。 While using the Nelder-Mead method as suggested by this answer solves the problem, I would like to understand: 虽然使用这个答案建议的Nelder-Mead方法解决了这个问题，但我想了解：

Why does the default method returns ~~the wrong answer without warning~~ the starting point as the answer - and is there a way I can ~~protect against "wrong answer without warning"~~ avoid this behavior in this case? 为什么默认方法返回~~错误的答案而没有警告~~起点作为答案 - 有没有办法可以~~防止“没有警告的错误答案”~~在这种情况下避免这种行为？

Note, the function separation uses the python package Skyfield to generate the values to be minimized which is not guaranteed smooth, which may be why Simplex is better here. 注意，函数separation使用python包Skyfield来生成要最小化的值，这不能保证平滑，这可能就是为什么Simplex在这里更好。

RESULTS: 结果：

test result: [ 2.14159739 ] 'correct': 2.14159265359 initial: 0.0 测试结果：[ 2.14159739 ]'正确'： 2.14159265359初始值：0.0

default result: [ 10000. ] 'correct': 13054 initial: 10000 默认结果：[ 10000。 ]'correct'：13054 initial： 10000

Nelder-Mead result: [ 13053.81011963 ] 'correct': 13054 initial: 10000 Nelder-Mead结果：[ 13053.81011963 ]'正确'： 13054初始值：10000

FULL OUTPUT using DEFAULT METHOD:
   status: 0
  success: True
     njev: 1
     nfev: 3
 hess_inv: array([[1]])
      fun: 1694.98753895812
        x: array([ 10000.])
  message: 'Optimization terminated successfully.'
      jac: array([ 0.])
      nit: 0

FULL OUTPUT using Nelder-Mead METHOD:
  status: 0
    nfev: 63
 success: True
     fun: 3.2179306044608054
       x: array([ 13053.81011963])
 message: 'Optimization terminated successfully.'
     nit: 28

Here is the full script: 这是完整的脚本：

def g(x, a, b):
    return np.cos(a*x + b)

def separation(seconds, lat, lon):
    lat, lon, seconds = float(lat), float(lon), float(seconds) # necessary it seems
    place = earth.topos(lat, lon)
    jd = JulianDate(utc=(2016, 3, 9, 0, 0, seconds))
    mpos = place.at(jd).observe(moon).apparent().position.km
    spos = place.at(jd).observe(sun).apparent().position.km
    mlen = np.sqrt((mpos**2).sum())
    slen = np.sqrt((spos**2).sum())
    sepa = ((3600.*180./np.pi) *
            np.arccos(np.dot(mpos, spos)/(mlen*slen)))
    return sepa

from skyfield.api import load, now, JulianDate
import numpy as np
from scipy.optimize import minimize

data = load('de421.bsp')

sun   = data['sun']
earth = data['earth']
moon  = data['moon']

x_init = 0.0
out_g = minimize(g, x_init, args=(1, 1))
print "test result: ", out_g.x, "'correct': ", np.pi-1, "initial: ", x_init    # gives right answer

sec_init = 10000
out_s_def = minimize(separation, sec_init, args=(32.5, 215.1))
print "default result: ", out_s_def.x, "'correct': ", 13054, "initial: ", sec_init

sec_init = 10000
out_s_NM = minimize(separation, sec_init, args=(32.5, 215.1),
                 method = "Nelder-Mead")
print "Nelder-Mead result: ", out_s_NM.x, "'correct': ", 13054, "initial: ", sec_init

print ""
print "FULL OUTPUT using DEFAULT METHOD:"
print out_s_def
print ""
print "FULL OUTPUT using Nelder-Mead METHOD:"
print out_s_NM

Answer 1

1) 1）

Your function is piecewise constant (has small-scale "staircase" pattern). 你的函数是分段常数（具有小规模的“阶梯”模式）。 It is not everywhere differentiable. 它无处不在。

The gradient of the function at the initial guess is zero. 初始猜测时函数的梯度为零。

The default BFGS optimizer sees the zero gradient and decides it is a local minimum by its criteria (which are based on assumptions about the input function that are not true in this case, such as differentiability). 默认的BFGS优化器会看到零梯度，并根据其标准确定它是局部最小值（这是基于输入函数的假设，在这种情况下是不正确的，例如可微分性）。

Basically, the exactly flat regions bomb the optimizer. 基本上，完全平坦的区域会轰炸优化器。 The optimizer probes the function in the small exactly flat region around the initial point, where everything looks like the function is just a constant, so it thinks you gave it a constant function. 优化器在初始点周围的小的精确平坦区域中探测函数，其中一切看起来像函数只是一个常量，因此它认为你给它一个常量函数。 Because your function is not differentiable everywhere, it is possible that almost all initial points are inside such flat regions, so that this can happen without bad luck in the choice of the initial point. 因为你的功能在任何地方都不可区分，所以几乎所有的初始点都可能在这样的平坦区域内，所以这可以在初始点的选择中没有运气不好。

Note also that Nelder-Mead is not immune to this --- it just happens its initial simplex is larger than the size of the staircase, so it probes the function around a larger spot. 还要注意Nelder-Mead对此并不免疫 - 只是它的初始单纯形大于楼梯的大小，因此它会探测更大点周围的功能。 If the initial simplex would be smaller than the staircase size, the optimizer would behave similarly as BFGS. 如果初始单形将小于阶梯大小，则优化器的行为与BFGS类似。

2) 2）

General answer: local optimizers return local optima. 一般答案：本地优化者返回当地最优。 Whether these coincide with the true optimum depends on the properties of the function. 这些是否与真实最佳值一致取决于函数的属性。

In general, to see if you're stuck in a local optimum, try different initial guesses. 一般来说，看看你是否陷入局部最优，尝试不同的初步猜测。

Also, using a derivative-based optimizer on a non-differentiable function is not a good idea. 此外，在非可微函数上使用基于导数的优化器并不是一个好主意。 If the function is differentiable on a "large" scale, you can adjust the step size of the numerical differentiation. 如果函数在“大”范围内是可微分的，则可以调整数值微分的步长。

Because there is no cheap/general way to check numerically if a function is everywhere differentiable, no such check is done --- instead it is an assumption in the optimization method that must be ensured by whoever inputs the objective function and chooses the optimization method. 因为没有便宜/一般的方法来检查函数是否在任何地方是可微分的，所以不进行这样的检查 - 相反，它是优化方法中的假设，必须通过输入目标函数并选择优化方法的人来确保。

Answer 2

The accepted answer by @pv. @pv接受的答案。 explains that Skyfield has a "staircase" response, meaning that some values it returns are locally flat except for discrete jumps. 解释说，Skyfield有一个“阶梯”响应，这意味着它返回的一些值是局部平坦的，除了离散跳跃。

I did a little experiment on the first step - converting times to JulianDate objects, indeed it looks like roughly 40 microseconds per increment, or about 5E-10 days. 我在第一步做了一个小实验 - 将时间转换为JulianDate对象，实际上它看起来大约是每增量40微秒，或大约5E-10天。 That's reasonable, considering the JPL databases span thousands of years . 考虑到JPL数据库跨越数千年，这是合理的。 While this is probably fine for almost any general astronomical-scale application, it's not actually smooth. 虽然这对于几乎任何一般的天文尺度应用来说都可能是好的，但它实际上并不平滑。 As the answer points out - the local flatness will trigger "success" in some (probably many) minimizers. 正如答案所指出的那样 - 局部平坦度将在某些（可能是许多）最小化器中触发“成功”。 This is expected and reasonable and is not in any way a failure of the method. 这是预期的和合理的，并且绝不是该方法的失败。

from skyfield.api import load, now, JulianDate
import numpy as np
import matplotlib.pyplot as plt

t  = 10000 + np.logspace(-10, 2, 25)        # logarithmic spacing
jd = JulianDate(utc=(2016, 3, 9, 0, 0, t))

dt  = t[1:] - t[:-1]
djd = jd.tt[1:] - jd.tt[:-1]

t  = 10000 + np.linspace(0, 0.001, 1001)        # linear spacing
jd = JulianDate(utc=(2016, 3, 9, 0, 0, t))

plt.figure()

plt.subplot(1,2,1)

plt.plot(dt, djd)
plt.xscale('log')
plt.yscale('log')

plt.subplot(1,2,2)

plt.plot(t, jd.tt-jd.tt[0])

plt.show()

Answer 3

I cannot extol too highly the value of the print statement for seeing how an algorithm is behaving through time. 我不能过分夸大print语句的值来看看算法在时间上是如何表现的。 If you try adding one to the top of your separation() function, then you will get to see the minimization routines work their way towards an answer: 如果您尝试在separation()函数的顶部添加一个，那么您将看到最小化例程朝着答案的方式工作：

def separation(seconds, lat, lon):
    print seconds
    ...

Adding this line will let you see that the Nelder-Mead method does a thorough search of the seconds range, striding forward in 500-second increments before it starts to play: 添加此行将让您看到Nelder-Mead方法彻底搜索秒范围，在开始播放之前以500秒为增量向前移动：

[ 10000.]
[ 10500.]
[ 11000.]
[ 11500.]
[ 12500.]
...

Of course, it does not know these are 500-second increments, because to a solver like these, the problem has no units. 当然，它不知道这些是500秒的增量，因为对于像这样的求解器，问题没有单位。 These adjustments might be 500 meters, or 500 angstroms, or 500 years. 这些调整可能是500米，或500埃，或500年。 But it stumbles blindly forward and, in the case of Nelder-Mead, sees enough of how the output varies with the input to hone in on an answer you like. 但它盲目地向前绊倒，在Nelder-Mead的情况下，看到了足够的输出如何随输入而变化，以便磨练你喜欢的答案。

Here, for contrast, is the entire search made by the default algorithm: 在这里，相比之下，是默认算法进行的整个搜索：

[ 10000.]
[ 10000.00000001]
[ 10000.]

That's it. 而已。 It tries stepping slight away by 1e-8 seconds, cannot see any different in the answer it gets, and gives up — as several of the other answers here correctly asserted. 它尝试稍微走开1e-8秒，在它得到的答案中看不到任何不同，并放弃 - 正如其他几个答案在这里正确断言。

Sometimes you can fix up a situation like this by telling the algorithm to (a) take a bigger step to start with and (b) stop testing once the step size it is making gets small — say, when it drops to a millisecond. 有时你可以通过告诉算法（a）采取更大的步骤来开始，以及（b）一旦它的步长变小 - 例如，当它下降到毫秒时停止测试，就可以解决这种情况。 You might try something like: 您可以尝试以下方法：

out_s_def = minimize(separation, sec_init, args=(32.5, 215.1),
                     tol=1e-3, options={'eps': 500})

In this case it still seems that the default minimization technique is too fragile to constructively find the minimum even when given this help, so we can do something else: we can tell the minimization function how many bits it really has to play with. 在这种情况下，即使给出了这个帮助，默认的最小化技术似乎太脆弱而无法建设性地找到最小值，所以我们可以做其他事情：我们可以告诉最小化函数它真正需要多少位。

You see, these minimization routines are often written with a fairly explicit knowledge of how precise a 64-bit float can be pushed before no more precision is available, and they are all designed to stop before that point. 您可以看到，这些最小化例程通常使用相当明确的知识来编写，以便在没有更多精度可用之前可以推送64位浮点数，并且它们都被设计为在该点之前停止。 But you are hiding the precision: you are telling the routine “give me a number of seconds” which makes them think that they can fiddle with even very tiny low-end digits of the seconds value, when in reality the seconds are getting combined with not just hours and days but with years, and in the process any tiny precision down at the bottom of the seconds is lost — though the minimizer does not know! 但是你隐藏了精确度：你告诉例程“给我几秒钟”，这让他们认为他们可以摆弄甚至非常微小的秒值的低端数字，而实际上秒与之相结合不只是几小时和几天，而是多年，在这个过程中，在秒的底部任何微小的精度都会丢失 - 虽然最小化器不知道！

So let's expose the real floating-point time to the algorithm. 因此，让我们将实际的浮点时间暴露给算法。 In the process, I'll do three things: 在这个过程中，我会做三件事：

Let's avoid the need for the float() maneuver you are doing. 让我们避免使用你正在进行的float()机动。 Our print statement shows the problem: that even though you provided a scalar float, the minimizer turns it into a NumPy array anyway: 我们的print语句显示了问题：即使你提供了一个标量浮点数，最小化器仍会将它转换为NumPy数组：
```
 (array([ 10000.]), 32.5, 215.1) 
```
But that is easy to fix: now that Skyfield has a separation_from() built in that can handle arrays just fine, we will use it: 但这很容易解决：既然Skyfield有一个内置的separation_from()可以很好地处理数组，我们将使用它：
```
 sepa = mpos.separation_from(spos) return sepa.degrees 
```
I will switch to the new syntax for creating dates, that Skyfield has adopted as it heads towards 1.0. 我将切换到创建日期的新语法，Skyfield已经采用了它的1.0。

That gives us something like (but note that this would be faster if you only built the topos once and passed it in, instead of rebuilding it and making it do its math over every time): 这给了我们类似的东西（但请注意，如果你只构建一次topos并将其传入，而不是重建它并让它每次都进行数学计算，这会更快）：

ts = load.timescale()

...

def separation(tt_jd, lat, lon):
    place = earth.topos(lat, lon)
    t = ts.tt(jd=tt_jd)
    mpos = place.at(t).observe(moon).apparent()
    spos = place.at(t).observe(sun).apparent()
    return mpos.separation_from(spos).degrees

...

sec_init = 10000.0
jd_init = ts.utc(2016, 3, 9, 0, 0, sec_init).tt
out_s_def = minimize(separation, jd_init, args=(32.5, 215.1))

The result is a successful minification, giving — I think, if you could double-check me here? 结果是成功的缩小，我想，如果你能在这里仔细检查我？ — the answer you are looking for: - 您正在寻找的答案：

print ts.tt(jd=out_s_def.x).utc_jpl()

['A.D. 2016-Mar-09 03:37:33.8224 UT']

I hope soon to bundle a number of pre-built minification routines in with Skyfield — in fact, a big reason for writing it to replace PyEphem was wanting to be able to unleash powerful SciPy optimizers and be able to abandon the rather anemic ones that PyEphem implements in C. The main thing they will have to be careful about is what happened here: that an optimizer needs to be given floating point digits to wiggle that are significant all the way down. 我希望很快能将一些预先建立的缩小程序与Skyfield捆绑在一起 - 实际上，编写它以取代PyEphem的一个重要原因是希望能够释放出强大的SciPy优化器并且能够放弃PyEphem相当贫乏的那些他们必须要小心的主要事情是这里发生了什么：优化器需要被赋予浮动数字以摆动，这些数字一直很重要。

Maybe I should look into allowing Time objects to compose their times from two floating point objects, so that many more seconds digits can be represented. 也许我应该考虑允许Time对象从两个浮点对象组成它们的时间，这样就可以表示更多的秒数。 I think AstroPy has done this, and it is traditional in astronomy programming. 我认为AstroPy已经做到了这一点，它在天文学编程中是传统的。

为什么scipy.optimize.minimize（默认）报告成功而不移动Skyfield？

问题描述

3 个解决方案

解决方案1
4 已采纳 2016-03-21 22:15:59

解决方案2
2 2016-03-22 01:16:02

解决方案3
2 2016-04-03 00:12:28

为什么scipy.optimize.minimize（默认）报告成功而不移动Skyfield？

问题描述

3 个解决方案

解决方案1 4 已采纳 2016-03-21 22:15:59

解决方案2 2 2016-03-22 01:16:02

解决方案3 2 2016-04-03 00:12:28

解决方案1
4 已采纳 2016-03-21 22:15:59

解决方案2
2 2016-03-22 01:16:02

解决方案3
2 2016-04-03 00:12:28