简体繁体 English

什么是机器学习的“预测”元素

[英]What is the 'predictive' element of machine learning

原文 2016-04-06 08:22:49 8 1 algorithm/ machine-learning/ statistics/ analysis

I'm hoping someone with a lot more knowledge of machine learning can help me out here. 我希望有更多机器学习知识的人可以帮助我。 I've been reading examples of regression and classification and I always seem to come back to the question 'what is really the difference between what this algorithm is doing and what standard statistical analysis would do'. 我一直在阅读回归和分类的例子，我似乎总是回到“这个算法正在做什么和标准统计分析将会做什么之间的真正区别”的问题。

Specifically, none of the examples I read seem to discuss the predictive element. 具体来说，我读过的所有例子似乎都没有讨论预测因素。 For example, when looking at linear regression the articles commonly explain the concept of trying to create a 'best fit' - the combination of a linear equation and then iterating a cost function until it reaches a minimum. 例如，在查看线性回归时，文章通常会解释尝试创建“最佳拟合”的概念 - 线性方程的组合，然后迭代成本函数，直到达到最小值。 Of course, throughout a lot of emphasis is put on a 'training data set'. 当然，很多重点都放在了“训练数据集”上。 No problem... but this is usually where it ends. 没问题......但这通常是它结束的地方。 At this point I can't see the difference between the above and the standard way in which one would carry out statistical analysis on a data set that was assumed to have a linear relationship. 在这一点上，我看不出上述和标准方式之间的区别，在这种方式中，人们将对假定具有线性关系的数据集进行统计分析。 Presumably, future values here are 'predicted' from the equation that was produced when the cost function converged on a minimum - again, there doesn't seem to be much 'learning' here as this is exactly what would be done in the usual case. 据推测，这里的未来值是从成本函数收敛到最小值时产生的等式“预测”的 - 再次，这里似乎没有太多'学习'，因为这正是通常情况下所做的。

After a long winded intro... what I'm trying to ask is how has the algorithm learned from the original training data? 经过长时间的介绍......我想问的是算法是如何从原始训练数据中学到的？ and how does this training set help with future data sets? 这个培训集如何帮助未来的数据集？ (again, this is where I get a bit lost - to me it seems that you would give it a new data set and carry out the same task of minimising the cost function - however, this time you have a better 'starting' point but all of your knowledge really comes from what you already 'knew' about the dataset ie that one assumed a linear relationship). （再次，这是我有点迷失的地方 - 对我而言，似乎你会给它一个新的数据集，并执行最小化成本函数的相同任务 - 但是，这次你有一个更好的'起点'但是你所有的知识都来自于你已经对数据集“已经知道”的东西，即一个假定为线性关系的东西。

I hope this makes sense - it's clearly a lack of understanding, but I'm hoping someone can shove me in the right direction. 我希望这是有道理的 - 显然缺乏理解，但我希望有人可以把我推向正确的方向。

Thanks! 谢谢！

1 个解决方案

You are right, there is no difference. 你是对的，没有区别。 Linear regression is purely a statistical method, and "fitting" would probably be more accurate than "learning" in this case. 线性回归纯粹是一种统计方法，在这种情况下，“拟合”可能比“学习”更准确。 But again, this is usually just the first lecture on the subject. 但同样，这通常只是关于这一主题的第一次讲座。 There many approaches where the differences are much clearer, for example SVMs. 有许多方法可以使差异更加清晰，例如SVM。 There are also approaches where the "learning" aspect is much clearer, eg using reirforcement learning in games, where you can actually see your system improve its performance with experience. 还有一些方法可以使“学习”方面更加清晰，例如在游戏中使用强制学习，您可以实际看到您的系统通过经验提高其性能。

Anyway, the main subject of machine learning is learning from examples. 无论如何，机器学习的主题是从实例中学习。 You are given a list of 100 patients, along with blood pressure, age, cholesterol level etc, and for each of them you are told whether they have heart disease or not. 您将获得100名患者的名单，以及血压，年龄，胆固醇水平等，并且每名患者都被告知他们是否患有心脏病。 Then, you are given a patient that you had not seen before. 然后，给你一个你以前没见过的病人。 Does he have heart disease?? 他有心脏病吗？ Most people call this prediction. 大多数人称之为预测。 You might prefer to call it fitting, or anything else. 您可能更喜欢称它为拟合或其他任何东西。 But the fact is, it usually works quite well. 但事实是，它通常运作良好。

Still, the subject remains closely tied to statistics, and indeed, you need to make some assumptions (to a larger or smaller extent, depending on the algorithm) about the underlying function. 尽管如此，主题仍然与统计数据密切相关，实际上，您需要对基础函数做出一些假设（在更大或更小的范围内，取决于算法）。 It is not perfect, but in many cases it's the best thing we have, so I would say it is worth studying. 它并不完美，但在很多情况下它是我们拥有的最好的东西，所以我认为它值得研究。 If you are starting now, there is a great online course, Stanford's "Statistical Learning", which deals with the subject from your point of view. 如果你现在开始，有一个很棒的在线课程，斯坦福大学的“统计学习”，从你的角度来处理这个问题。