I am trying to fit a polynomial to a set of data. Sometimes it may happen that the covariance matrix returned by numpy.ployfit
only consists of inf
, although the fit seems to be useful. There are no numpy.inf
or 'numpy.nan' in the data!
Example:
import numpy as np
# sample data, does not contain really x**2-like behaviour,
# but that should be visible in the fit results
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]
fit, cov = np.polyfit(x, y, 2, cov=True)
print 'fit: ', fit
print 'cov: ', cov
Result:
fit: [ 1.67867158e-06 5.69199547e-04 8.85146009e-01]
cov: [[ inf inf inf]
[ inf inf inf]
[ inf inf inf]]
np.cov(x,y)
gives
[[ 6.25000000e+01 -6.07388099e-02]
[ -6.07388099e-02 5.92268942e-05]]
So np.cov
is not the same as the covariance returned from np.polyfit
. Has anybody an idea what's going on?
EDIT: I now got the point that numpy.cov
is not what I want. I need the variances of the polynom coefficients, but I dont get them if (len(x) - order - 2.0) == 0
. Is there another way to get the variances of the fit polynom coefficients?
As rustil 's answer says, this is caused by the bias correction applied to the denominator of the covariance equation, which results in a zero divide for this input. The reasoning behind this correction is similar to that behind Bessel's Correction . This is really a sign that there are too few datapoints to estimate covariance in a well-defined way.
How to skirt this problem? Well, this version of polyfit
accepts weights . You could add another datapoint but weight it at epsilon. This is equivalent to reducing the 2.0
in this formula to a 1.0
.
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]
x_extra = x + x[-1:]
y_extra = y + y[-1:]
weights = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]
fit, cov = np.polyfit(x, y, 2, cov=True)
fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)
print fit == fit_extra
print cov_extra
The output. Note that the fit values are identical:
>>> print fit == fit_extra
[ True True True]
>>> print cov_extra
[[ 8.84481850e-11 8.11954338e-08 1.86299297e-05]
[ 8.11954338e-08 7.45405039e-05 1.71036963e-02]
[ 1.86299297e-05 1.71036963e-02 3.92469307e+00]]
I am very uncertain that this will be especially meaningful, but it's a way to work around the problem. It's a bit of a kludge though. For something more robust, you could modify polyfit
to accept its own ddof
parameter, perhaps in lieu of the boolean that cov
currently accepts. (I just opened an issue to suggest as much.)
A quick final note about the calculation of cov
: If you look at the wikipedia page on least squares regression , you'll see that the simplified formula for the covariance of the coefficients is inv(dot(dot(X, W), X))
, which has a corresponding line in the numpy code -- at least roughly speaking. In this case, X
is the Vandermonde matrix , and the weights have already been multiplied in . The numpy code also does some scaling (which I understand; it's part of a strategy to minimize numerical error) and multiplies the result by the norm of the residuals (which I don't understand; I can only guess that it's part of another version of the covariance formula).
the difference should be in the degree of freedom. In the polyfit
method it already takes into account that your degree is 2, thus causing:
RuntimeWarning: divide by zero encountered in true_divide
fac = resids / (len(x) - order - 2.0)
you can pass your np.cov
a ddof=
keyword (ddof = delta degrees of freedom) and you'll run into the same problem
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.