简体   繁体   English

使用 Python 进行线性回归

[英]Linear regression with Python

I am studying "Building Machine Learning System With Python (2nd)".我正在学习“使用 Python 构建机器学习系统(第 2 次)”。 I have a silly doubt in very first chapters' answer part.我在第一章的回答部分有一个愚蠢的疑问。 According to the book and based on my observation I always get 2nd order polynomial as the best fitting curve.根据这本书和我的观察,我总是得到二阶多项式作为最佳拟合曲线。 whenever I train my system with training dataset, I get different Test error for different Polynomial Function.每当我用训练数据集训练我的系统时,不同的多项式函数都会得到不同的测试误差。 Thus my parameters of the equation also differs.因此,我的方程参数也不同。 But surprisingly, I get approximately same answer every time in the range 9.19-9.99 .但令人惊讶的是,我每次都在 9.19-9.99 范围内得到大致相同的答案。 My final hypothesis function each time have different parameters but I get approximately same answer.我的最终假设函数每次都有不同的参数,但我得到的答案大致相同。 Can anyone tell me the reason behind it?谁能告诉我背后的原因? [FYI:I am finding answer for y=100000] I am sharing the code sample and the output of each iteration. [仅供参考:我正在寻找 y=100000 的答案] 我正在分享代码示例和每次迭代的输出。

Here are the errors and the corresponding answers with it:以下是错误和相应的答案:

Thanks in advance!提前致谢!

def error(f, x, y):
    return sp.sum((f(x)-y)**2)
import scipy as sp
import matplotlib.pyplot as mp
data=sp.genfromtxt("web_traffic.tsv",delimiter="\t")
x=data[:,0]
y=data[:,1]
x=x[~sp.isnan(y)]
y=y[~sp.isnan(y)]
mp.scatter(x,y,s=10)
mp.title("web traffic over the month")
mp.xlabel("week")
mp.ylabel("hits/hour")
mp.xticks([w*24*7 for w in range(10)],["week %i"%i for i in range(10)])
mp.autoscale(enable=True,tight=True)
mp.grid(color='b',linestyle='-',linewidth=1)
mp.show()
infletion=int(3.5*7*24)
xa=x[infletion:]
ya=y[infletion:]
f1=sp.poly1d(sp.polyfit(xa,ya,1))
f2=sp.poly1d(sp.polyfit(xa,ya,2))
f3=sp.poly1d(sp.polyfit(xa,ya,3))
print(error(f1,xa,ya))
print(error(f2,xa,ya))
print(error(f3,xa,ya))
fx=sp.linspace(0,xa[-1],1000)
mp.plot(fx,f1(fx),linewidth=1)
mp.plot(fx,f2(fx),linewidth=2)
mp.plot(fx,f3(fx),linewidth=3)
frac=0.3
partition=int(frac*len(xa))
shuffled=sp.random.permutation(list(range(len(xa))))
test=sorted(shuffled[:partition])
train=sorted(shuffled[partition:])
fbt1=sp.poly1d(sp.polyfit(xa[train],ya[train],1))
fbt2=sp.poly1d(sp.polyfit(xa[train],ya[train],2))
fbt3=sp.poly1d(sp.polyfit(xa[train],ya[train],3))
fbt4=sp.poly1d(sp.polyfit(xa[train],ya[train],4))
print ("error in fbt1:%f"%error(fbt1,xa[test],ya[test]))
print ("error in fbt2:%f"%error(fbt2,xa[test],ya[test]))
print ("error in fbt3:%f"%error(fbt3,xa[test],ya[test]))
from scipy.optimize import fsolve
print (fbt2)
print (fbt2-100000)
maxreach=fsolve(fbt2-100000,x0=800)/(7*24)
print ("ans:%f"%maxreach)

Don't do this like that.不要那样做。 Linear regression is more "up to you" than you think.线性回归比您想象的更“由您决定”。

Start by getting the slope of the line, (#1) average((f(x2)-f(x))/(x2-x))首先获取直线的斜率,(#1) average((f(x2)-f(x))/(x2-x))

Then use that answer as M to (#2) average(f(x)-M*x).然后将该答案用作 M to (#2) average(f(x)-M*x)。

Now you have (#1) and (#2) as your regression.现在你有 (#1) 和 (#2) 作为你的回归。

For any type of regression similar to this ex, Polynomial,对于任何类似于这个例子的回归,多项式,

you need to subtract the A-Factor (First Factor), by using the n super-delta of f(x) with every one with respect to delta(x).您需要通过使用 f(x) 的 n 个超 delta 减去 A 因子(第一因子),其中每个都与 delta(x) 有关。 Ex.例如。 delta(ax^2+bx+c)/delta(x) gives you a equation with a and b, and from there it works. delta(ax^2+bx+c)/delta(x) 给你一个带有 a 和 b 的方程,然后它就可以工作了。 When doing this take the average every time if there is more entries.这样做时,如果有更多条目,则每次取平均值。 Do It like a window on a paper sliding down.像纸上的窗户滑下来一样去做。 Ex.例如。 You select entries 1-10, then 2-11,3-12 etc for some crazy awesome regression.您选择条目 1-10,然后是 2-11,3-12 等以进行一些疯狂的令人敬畏的回归。 You may want to create a matrix API.您可能想要创建一个矩阵 API。 The best way to handle it, is first create a API that takes a row and a column out first.处理它的最佳方法是首先创建一个 API,首先取出一行和一列。 THEN you fool around with that to automate it.然后你就用它来自动化它。 The Ratios of the in-out entries left in only 2 cols, is averaged and is the solution to the coefficient.仅保留 2 列的输入输出条目的比率取平均值并且是系数的解。 Then Make a program to take rows out but for example leave row 1 & row 5 (OUTPUT), then row 2,row 5... row 4 and row 5. I wouldn't recommend python for coding this.然后制作一个程序来取出行,但例如保留第 1 行和第 5 行(输出),然后是第 2 行、第 5 行...第 4 行和第 5 行。我不建议使用 python 进行编码。 I recommend C programming, because It prevents you from making dirty arrays that you don't remember.我推荐 C 编程,因为它可以防止你创建你不记得的脏数组。 Systems-Theory you need to understand.您需要了解的系统理论。 You must create system-by-system.您必须逐个系统地创建。 It is insane to code matrices without building automated sub-systems that are carefully tested.在没有构建经过仔细测试的自动化子系统的情况下对矩阵进行编码是很疯狂的。 I failed until I worked on it in C, so I already made a 1 time shrinking function that is carefully tested, then built systems to automate getting 1 coefficient, tested that, then automated the repetition of that program to solve it.我失败了,直到我用 C 语言处理它,所以我已经做了一个经过仔细测试的 1 倍收缩函数,然后构建系统来自动获取 1 个系数,测试它,然后自动重复该程序来解决它。 You won't understand any of this by using python or similar shortcuts.使用 python 或类似的快捷方式,你不会理解这些。 You use them after you realize what they really are.在您了解它们的真正含义后使用它们。 That's how I learned.我就是这样学习的。 I still am like how did I code that?我仍然喜欢我是如何编码的? I still am amazed.我还是很惊讶。 Problem is though, it's unstable above 4x4 (actually 4x5) matrices.但问题是,它在 4x4(实际上是 4x5)矩阵之上不稳定。

Good Luck, Misha Taylor祝你好运,米莎泰勒

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM