简体   繁体   English

哪个多项式回归次数显着? 取决于点数或其他参数?

[英]Which polynomial regression degree is significant ? depends of number of points or other parameters?

I am studying the stability of numerical derivatives as a function of the step I take to compute these derivatives.我正在研究数值导数的稳定性,作为计算这些导数的步骤的 function。 With a derivative with 15 points (obtained by the finite difference method), I get the following plot (each multipole " l " corresponds to a parameter whose depends the derivative but it doesn't matter):使用具有 15 个点的导数(通过有限差分法获得),我得到以下 plot (每个多极“ l ”对应于一个取决于导数但无关紧要的参数):

绘制 15 点的导数

Now, I would like to compare this derivative of 15 points with the derivative computed with 3, 5 and 7 points.现在,我想将这个 15 点的导数与用 3、5 和 7 点计算的导数进行比较。 For this, I have just plotted the relative difference like (with absolute differences):为此,我刚刚绘制了相对差异(具有绝对差异):

abs(f'_15_pts - f'_3_pts)/f'_3_pts for comparison between 15 and 3 points
abs(f'_15_pts - f'_5_pts)/f'_5_pts for comparison between 15 and 5 points
abs(f'_15_pts - f'_7_pts)/f'_7_pts for comparison between 15 and 7 points

And my issue occurs when I want to do a polynomial regression on the relative variations above with the multipole l=366.42 (the problem remains right for other multipoles).当我想用多极 l=366.42 对上述相对变化进行多项式回归时,就会出现我的问题(对于其他多极,问题仍然存在)。

For example, when I do a cubic regression (3 degrees), I get the following plot:例如,当我进行三次回归(3 度)时,我得到以下 plot:

三次插值回归

I don't know exactly how to interpret these results: maybe it means that I have a relative error maximal between 3 points and 15 points derivatives and less between 5 and 15 like between 7 and 15 points.我不知道如何准确地解释这些结果:也许这意味着我在 3 点和 15 点导数之间有一个相对误差最大值,而在 5 到 15 点之间有一个较小的相对误差,比如在 7 到 15 点之间。

Then, If I want to do for example a polynomial regression of degree 10, I get the following plot:然后,如果我想做例如 10 次多项式回归,我得到以下 plot:

10度插值回归

As you can see, this is totally different from cubic regression above.如您所见,这与上面的三次回归完全不同。

So I don't know which degree to take for polynomial regression, I mean which degree is relevant to get valid physical results: 3, 4, 6 or maybe 10 .所以我不知道多项式回归采用哪个度数,我的意思是哪个度数与获得有效的物理结果相关: 3, 4, 6 或可能 10 If I take a too large degree, results aren't valid since I have dirac peaks and straight lines.如果我的度数太大,结果无效,因为我有狄拉克峰和直线。

I guess the right polynomial degree to keep depends of the initial number of points of the interpolated curve (140 points for the first figure) and maybe others parameters.我猜要保留的正确多项式次数取决于插值曲线的初始点数(第一个图为 140 个点)以及其他参数。

As conclusion, could anyone tell me if there is criteria to determine which polynomial degree to apply?, I mean the degree which will be the most relevant from a relative error point of view.作为结论,谁能告诉我是否有标准来确定应用哪个多项式次数?我的意思是从相对误差的角度来看最相关的次数。

If I don't do regression, I have the following plot which is blurred to interpret:如果我不做回归,我有以下 plot 难以解释:

相对误差的原始图

That's why I would like to interpolate these data, to see more clearly the differences between different relative evolutions.这就是为什么我想对这些数据进行插值,以更清楚地看到不同相对演化之间的差异。

PS: here the code snippets of polynomial regression: PS:这里是多项式回归的代码片段:

stepForFit = np.logspace(-8.0,-1.0,10000)
coefs_3_15 = poly.polyfit(np.log10(stepNewArray), np.log10(errorRelative_3_15), 10)
ffit_3_15 = poly.polyval(np.log10(stepForFit), coefs_3_15)
coefs_5_15 = poly.polyfit(np.log10(stepNewArray), np.log10(errorRelative_5_15), 10)
ffit_5_15 = poly.polyval(np.log10(stepForFit), coefs_5_15)
coefs_7_15 = poly.polyfit(np.log10(stepNewArray), np.log10(errorRelative_7_15), 10)
ffit_7_15 = poly.polyval(np.log10(stepForFit), coefs_7_15)

# Plot interpolation curves
plt.plot(stepForFit[stepArrayId], np.power(10,ffit_3_15[stepArrayId]), colorDerPlot[0])
plt.plot(stepForFit[stepArrayId], np.power(10,ffit_5_15[stepArrayId]), colorDerPlot[1])
plt.plot(stepForFit[stepArrayId], np.power(10,ffit_7_15[stepArrayId]), colorDerPlot[2])v

UPDATE 1: Given I have not hypothesis (or a model) on the value of relative error, I can't put constraints apriori on the degree of polynome that must best-fit the data.更新1:鉴于我没有关于相对误差值的假设(或模型),我不能对必须最适合数据的多项式程度施加先验约束。

But maybe I have an clue since the derivatives that I have computed are 3, 5, 7 and 15 points.但也许我有一个线索,因为我计算的导数是 3、5、7 和 15 个点。 So I have an uncertainty of level respectively of O(h^2), O(h^4), O(h^6) and O(h^14).所以我分别有O(h ^ 2),O(h ^ 4),O(h ^ 6)和O(h ^ 14)的水平不确定性。

For example, for the 3 points derivatives, I have:例如,对于 3 点导数,我有:

在此处输入图像描述

and so the final expression of derivatives:所以导数的最终表达式:

在此处输入图像描述

By the way, I don't understand why we pass from $O(h^4)$ to $O(h^2)$ between the expression.顺便说一句,我不明白为什么我们在表达式之间从 $O(h^4)$ 传递到 $O(h^2)$ 。

But main issue is that I have not for instant hypothesis on the polynomial degree that I have to apply.但主要问题是我没有立即假设我必须应用的多项式次数。

Maybe, I should test a range of polynomial degree and compute at each time the chi2, so minimal chi2 will give to me the right degree to take into account.也许,我应该测试一系列多项式次数并在每次 chi2 时计算,所以最小的 chi2 会给我正确的次数来考虑。

What do you think about this?你怎么看待这件事? Does Numpy or Python already have this kind of study with specific functions? Numpy 或 Python 是否已经有这种具有特定功能的研究?

UPDATE 2: I tried to determine over a range of 1-15 degree of polynomial that could best fit the data.更新 2:我试图确定最适合数据的 1-15 次多项式范围。 My criterion was to fit with a polynome for each degree and then compute the chi2 between "interpolation computed data" and "experimental data".我的标准是适合每个度数的多项式,然后计算“插值计算数据”和“实验数据”之间的 chi2。 If new chi2 is lower than previous chi2, I update the degree to choose to do a polynomial regression.如果新的 chi2 低于之前的 chi2,我会更新度数以选择进行多项式回归。

Unfortunately, for each of the 3,5 and 7 points derivatives, I always get by this research of "ideal degree", the maximum degree which corresponds to the max of degree-interval explored.不幸的是,对于 3,5 和 7 点导数中的每一个,我总是通过“理想度数”的这项研究得到,最大度数对应于所探索的度数间隔的最大值。

Ok, the chi2 is minimal for the highest degree but this doesn't correspond to physical results.好的,chi2 对于最高程度来说是最小的,但这与物理结果不对应。 One doesn't forget that below 10^-4, behavior of Cl' is chaotic, so I don't expect a physical interpretation on the convergence of the derivatives as an increasing number of points for derivatives.不要忘记,在 10^-4 以下,Cl' 的行为是混乱的,所以我不期望将导数收敛的物理解释为导数点数的增加。

But the interesting area is above 10^-4 where I have more stability.但有趣的区域在 10^-4 以上,我有更多的稳定性。

Given my method of selecting best degree as a function of chi2 doesn't work (it always gives the maximal degree of the range explored), is there another method to get a nice fit?鉴于我选择最佳学位作为 chi2 的 function 的方法不起作用(它总是给出所探索范围的最大程度),还有另一种方法可以很好地拟合吗? I know this is difficult since chaotic region for small steps.我知道这很困难,因为小步骤的混乱区域。

Last thing, the cubic regression (3 degrees) gives nice curves but I don't understand why this only occurs for degree 3 and not for higher degree.最后一件事,三次回归(3 度)给出了很好的曲线,但我不明白为什么这只发生在 3 度而不是更高的度数。

As someone said in the comments, for higher degree, regression is overfitted: how to fix this?正如有人在评论中所说,对于更高程度的回归,过度拟合:如何解决这个问题?

I have to say that I find your question formulation very confusing, so I can only help you with a bit general answer.我不得不说我发现你的问题表述非常混乱,所以我只能帮你提供一些笼统的答案。 Perhaps you could split your big question in several smaller ones next time.也许下次你可以把你的大问题分成几个小问题。

To start with, I assume that your question is: how does the amount of points in a differentiation stencil matter, when I will do polynomial interpolation on the derivative afterwards?首先,我假设您的问题是:微分模板中的点数如何影响,之后我何时对导数进行多项式插值?

The number of points in a stencil generally increases the accuracy of the computation of the derivative.模板中的点数通常会提高导数计算的准确性。 You can see that by filling in Taylor expansions for the variables in the numerical derivative.您可以通过填写数值导数中变量的泰勒展开来看到这一点。 After terms cancels you are left with some higher-order term that gives you a lower bound on the error that you make.条款取消后,您会留下一些高阶条款,为您所犯的错误提供下限。 The underlying assumption however is, that the function (in your case C ) of which you compute the derivative is smooth on the interval that you compute the derivatives on.然而,基本假设是,您计算导数的 function (在您的情况下为 C )在您计算导数的区间上是平滑的。 Meaning that if your function is not nicely behaved on your 15-point stencil, then that derivative is essentially worthless.这意味着如果您的 function 在您的 15 点模板上表现不佳,那么该衍生产品基本上毫无价值。

The order of the polynomial in polynomial regression is usually a free parameter chosen by the user because the user might know that their series is behaved like a polynomial up to certain degree, but unaware of the polynomial coefficients.多项式回归中多项式的数通常是用户选择的自由参数,因为用户可能知道他们的序列在一定程度上表现得像多项式,但不知道多项式系数。 If you know something about your data, you can set the degree yourself.如果您对数据有所了解,则可以自己设置程度。 For instance if you know your data is linear correlated with step, you could set the degree to 1 and you have linear regression.例如,如果您知道您的数据与步长线性相关,您可以将度数设置为 1,然后您就可以进行线性回归。 In this case you don't want to specify any higher degree because your data will likely fit to a polynomial, which you know is not the case.在这种情况下,您不想指定任何更高的次数,因为您的数据可能适合多项式,而您知道情况并非如此。 In a similar way, if you know your data behaves like a polynomial of some degree, you certainly don't want to fit any higher.以类似的方式,如果您知道您的数据在某种程度上表现得像多项式,那么您当然不想拟合更高的值。 If you have really no clue what degree the polynomial should be, then you should make an educated guess.如果您真的不知道多项式应该是什么次数,那么您应该做出有根据的猜测。 A good strategy is just to plot the polynomial going through the data points, upping the polynomial one degree at the time.一个好的策略是对 plot 多项式遍历数据点,同时将多项式提高一级。 You then want the line to go in between the points, and not diverge towards specific points.然后,您希望在两点之间将线指向 go,而不是向特定点发散。 In case you have many outliers, there exists method that are better suited than least-squares.如果您有许多异常值,则存在比最小二乘法更适合的方法。

Now onward to your problem specifically.现在具体来说你的问题。

  • There is no way to compute an optimal degree, unless you have more information about your data.除非您有有关数据的更多信息,否则无法计算最佳度数。 Degree is a hyperparameter.度数是一个超参数。 If you want an optimum for it, you need to put additional prior information, such as "I want the lowest degree polynomial that fits the data with an error epsilon."如果您想要一个最优值,您需要添加额外的先验信息,例如“我想要适合具有误差 epsilon 的数据的最低次数多项式”。
  • Overfitting is simply fixed by choosing a lower degree polynomial.通过选择较低次数的多项式可以简单地修复过度拟合。 If that does not fix it, then least-squares regression is not for you.如果这不能解决问题,那么最小二乘回归不适合您。 You need to look into a regression method that chooses a different metric, or you need to preprocess your data, or you need a non-polynomial fit (fit a function of certain shape, then use Levenberg-Marquardt for instance).您需要研究一种选择不同度量的回归方法,或者您需要预处理数据,或者您需要非多项式拟合(拟合特定形状的 function,然后使用 Levenberg-Marquardt 例如)。
  • 15-step derivative looks very questionable, you do likely not have this kind of smoothness in your data. 15 步导数看起来很可疑,您的数据中可能没有这种平滑度。 If you have a good reason for this, tell us, otherwise just use 2 points for the first derivative, or 3 or 5 for the second.如果您有充分的理由,请告诉我们,否则一阶导数使用 2 分,二阶导数使用 3 或 5。
  • The expression with the Landau symbol (big-O) is not switching fourth order to second.带有朗道符号 (big-O) 的表达式不会将四阶转换为二阶。 If you subtract the two equations and divide by h^2 the O(h^4)/h^2 becomes O(h^2) .如果将这两个方程相减并除以h^2 ,则O(h^4)/h^2变为O(h^2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 多项式度散点图点不适合线性回归 - polynomial degree scatter graph points not fitting for linear regression 多项式回归度增加误差 - Polynomial Regression Degree Increasing Error 如何使用三次或更高次多项式曲面回归拟合一组 3D 数据点? - How to fit a set of 3D data points using a third or higher degree of polynomial surface regression? Python中多元5度多项式回归的曲面图 - Surface plot for multivariate 5 degree polynomial regression in Python 多项式次数小于或等于指定多项式次数的特征之间的哪些组合算作多项式组合? - Which combinations between features with a polynomial degree less than or equal to a specified polynomial degree count as polynomial combinations? Python 3D 数据点上的多项式回归 - Python Polynomial Regression on 3D Data points 多项式回归度增加后,火车分数降低 - Train score diminishes after polynomial regression degree increases 线性数据的正则化多项式回归 - 仅惩罚 2 阶系数 - Regularized polynomial regression on linear data - penalize only degree 2 coefficient 岭回归和 SVM 回归 (SVR) 之间的差异,多项式 kernel 度数 = 1 - Difference between ridge regression and SVM regressor (SVR) with polynomial kernel of degree = 1 不同参数的不同次数的多元多项式回归 - multiple polynomial regression with different degrees for the different parameters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM