简体   繁体   English

如何在 Python 中进行多元非线性回归?

[英]How do I do multivariate non-linear regression in Python?

Let's say my actual equation is y = a * b + c假设我的实际方程是 y = a * b + c

So my data set looks like所以我的数据集看起来像

a b c y
2 5 4 14
3 7 2 23
1 7 4 15
4 1 7 11
3 2 1 7
1 2 3 5

And so forth.等等。 What module do I use in order to have an output that tells me "y = a * b + c"?我应该使用什么模块来获得告诉我“y = a * b + c”的输出? Is this even possible?这可能吗?

How about y = a * a + b? y = a * a + b 怎么样? Any pointers to documentation or explanation of what I should try would be great.任何指向我应该尝试的文档或解释的指针都会很棒。

Edit:编辑:

The duplicate is clearly a different scenario.副本显然是一个不同的场景。 In that example there is a single formula that describes a line;在那个例子中,有一个公式描述了一条线; in my example it is many variables that mostly fit a result.在我的示例中,有许多变量最适合结果。 That other one does not talk about squared terms.另一个不谈论平方项。

There is no module.没有模块。 Your general problem is "what simple function best fits this data?"您的一般问题是“什么简单函数最适合此数据?” There is no general solution, as "simple" requires proper definition and restriction to yield a meaningful answer.没有通用的解决方案,因为“简单”需要适当的定义和限制才能产生有意义的答案。

A basic theorem of algebra shows that a data set on N points can be fitted by a polynomial of degree no more than N-1 .代数的一个基本定理表明, N个点上的数据集可以用次数不超过N-1的多项式拟合。 Restricting more than this requires that you define search space and explore within that definition.超出此范围的限制要求您定义搜索空间并在该定义内探索。

Yes, there exist methods to set a maximum degree and work within that;是的,存在设置最大度数并在其中工作的方法; you can write a loop to increase that degree until you find an exact solution.你可以写一个循环来增加那个度数,直到你找到一个精确的解决方案。

I suggest that you look at the curve-fitting methods of Scikit and employ those in a solution of your own devising.我建议您查看Scikit的曲线拟合方法,并将其用于您自己设计的解决方案中。 You may need to work through all combinations of your chosen degree, adding new terms each time you increase the degree.您可能需要研究所选学位的所有组合,并在每次增加学位时添加新术语。 You may also need to write the exploration to consider those terms in the order of your defined complexity.您可能还需要编写探索以按照您定义的复杂性的顺序考虑这些术语。


Response to OP comment:对 OP 评论的回应:

I see;我懂了; you're somewhat following in the footsteps of FiveThirtyEight.com, best known for accuracy with baseball and elections in the USA.您有点追随 FiveThirtyEight.com 的脚步,该网站以美国棒球和选举的准确性而闻名。 Depending on the accuracy you want, this problem gets nasty very quickly.根据您想要的准确性,这个问题很快就会变得棘手。 You get terms such as ((MY_OFF-OPP_DEF) ^ 1.28 + 2.1 - sqrt(OPP_GK)) / BLAH .您会得到诸如((MY_OFF-OPP_DEF) ^ 1.28 + 2.1 - sqrt(OPP_GK)) / BLAH

In any case, you're likely into a deep learning regression application, somewhat more complex than a "simple" sum-of-products scenario.在任何情况下,您都可能会进入深度学习回归应用程序,它比“简单”的产品总和场景要复杂一些。 You might get acceptable results with "mere" machine learning, but be prepared for disappointment in even the simpler task of predicting the winner.您可能会通过“纯”机器学习获得可接受的结果,但要做好失望的准备,即使是预测获胜者这一更简单的任务。

Have you thought about giving the scikit-learn Gradient Boosting Regressor a try?您是否考虑过尝试使用scikit-learn Gradient Boosting Regressor Please refer to the user guide for code examples of how this method can be used on regression problems.有关如何将此方法用于回归问题的代码示例,请参阅用户指南。

Please also note that the documentation states that另请注意,文档指出

scikit-learn 0.21 introduces two new experimental implementations of gradient boosting trees, namely HistGradientBoostingClassifier and HistGradientBoostingRegressor, inspired by LightGBM. scikit-learn 0.21 引入了两个新的梯度提升树实验性实现,即 HistGradientBoostingClassifier 和 HistGradientBoostingRegressor,其灵感来自 LightGBM。 These histogram-based estimators can be orders of magnitude faster than GradientBoostingClassifier and GradientBoostingRegressor when the number of samples is larger than tens of thousands of samples.当样本数量大于数万个样本时,这些基于直方图的估计器可以比 GradientBoostingClassifier 和 GradientBoostingRegressor 快几个数量级。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM