简体   繁体   English

Python:寻找一组数字的趋势

[英]Python: Finding a trend in a set of numbers

I have a list of numbers in Python, like this:我在 Python 中有一个数字列表,如下所示:

x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]

What's the best way to find the trend in these numbers?找出这些数字趋势的最佳方法是什么? I'm not interested in predicting what the next number will be, I just want to output the trend for many sets of numbers so that I can compare the trends.我对预测下一个数字是什么不感兴趣,我只想 output 多组数字的趋势,以便我可以比较趋势。

Edit: By trend, I mean that I'd like a numerical representation of whether the numbers are increasing or decreasing and at what rate.编辑:按趋势,我的意思是我想要数字表示是增加还是减少以及以什么速度减少。 I'm not massively mathematical, so there's probably a proper name for this!我不太懂数学,所以可能有一个合适的名字!

Edit 2: It looks like what I really want is the co-efficient of the linear best fit.编辑 2:看起来我真正想要的是线性最佳拟合的系数。 What's the best way to get this in Python?在 Python 中获得此信息的最佳方式是什么?

Possibly you mean you want to plot these numbers on a graph and find a straight line through them where the overall distance between the line and the numbers is minimized?可能你的意思是你想要 plot 图表上的这些数字并找到一条穿过它们的直线,其中直线和数字之间的总距离最小? This is called a linear regression这称为线性回归

def linreg(X, Y):
    """
    return a,b in solution to y = ax + b such that root mean square distance between trend line and original points is minimized
    """
    N = len(X)
    Sx = Sy = Sxx = Syy = Sxy = 0.0
    for x, y in zip(X, Y):
        Sx = Sx + x
        Sy = Sy + y
        Sxx = Sxx + x*x
        Syy = Syy + y*y
        Sxy = Sxy + x*y
    det = Sxx * N - Sx * Sx
    return (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det


x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
a,b = linreg(range(len(x)),x)  //your x,y are switched from standard notation

The trend line is unlikely to pass through your original points, but it will be as close as possible to the original points that a straight line can get.趋势线不太可能通过你原来的点,但是会尽量靠近一条直线能到的原来的点。 Using the gradient and intercept values of this trend line (a,b) you will be able to extrapolate the line past the end of the array:使用此趋势线 (a,b) 的梯度和截距值,您将能够推断该线超过数组的末尾:

extrapolatedtrendline=[a*index + b for index in range(20)] //replace 20 with desired trend length

The Link provided by Keith or probably the answer from Riaz might help you to get the poly fit, but it is always recommended to use libraries if available, and for the problem in your hand, numpy provides a wonderful polynomial fit function called polyfit . Keith 提供的链接或 Riaz 的答案可能会帮助您获得 poly fit,但始终建议您使用可用的库,对于您手头的问题, numpy提供了一个很棒的多项式拟合 function,称为polyfit You can use polyfit to fit the data over any degree of equation.您可以使用 polyfit 来拟合任意阶方程的数据。

Here is an example using numpy to fit the data in a linear equation of the form y=ax+b这是一个使用 numpy 将数据拟合为 y=ax+b 形式的线性方程的示例

>>> data = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> x = np.arange(0,len(data))
>>> y=np.array(data)
>>> z = np.polyfit(x,y,1)
>>> print "{0}x + {1}".format(*z)
4.32527472527x + 17.6
>>> 

similarly a quadratic fit would be同样,二次拟合是

>>> print "{0}x^2 + {1}x + {2}".format(*z)
0.311126373626x^2 + 0.280631868132x + 25.6892857143
>>> 

Here is one way to get an increasing/decreasing trend:这是获得增加/减少趋势的一种方法:

>>> x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> trend = [b - a for a, b in zip(x[::1], x[1::1])]
>>> trend
[22, -5, 9, -4, 17, -22, 5, 13, -13, 21, 39, -26, 13]

In the resulting list trend , trend[0] can be interpreted as the increase from x[0] to x[1] , trend[1] would be the increase from x[1] to x[2] etc. Negative values in trend mean that value in x decreased from one index to the next.在结果列表trend中, trend[0]可以解释为从x[0]x[1]的增加, trend[1]将是从x[1]x[2]等的增加。负值在trend意味着x中的值从一个指数下降到下一个指数。

You could do a least squares fit of the data.您可以对数据进行最小二乘法拟合

Using the formula from this page :使用此页面中的公式:

y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
N = len(y)
x = range(N)
B = (sum(x[i] * y[i] for i in xrange(N)) - 1./N*sum(x)*sum(y)) / (sum(x[i]**2 for i in xrange(N)) - 1./N*sum(x)**2)
A = 1.*sum(y)/N - B * 1.*sum(x)/N
print "%f + %f * x" % (A, B)

Which prints the starting value and delta of the best fit line.打印最佳拟合线的起始值和增量。

I agree with Keith, I think you're probably looking for a linear least squares fit (if all you want to know is if the numbers are generally increasing or decreasing, and at what rate).我同意 Keith 的观点,我认为您可能正在寻找线性最小二乘法拟合(如果您只想知道数字是否普遍增加或减少,以及以什么速度)。 The slope of the fit will tell you at what rate they're increasing.拟合的斜率会告诉您它们的增长速度。 If you want a visual representation of a linear least squares fit, try Wolfram Alpha:如果您想要线性最小二乘法拟合的可视化表示,请尝试 Wolfram Alpha:

http://www.wolframalpha.com/input/?i=linear+fit+%5B12%2C+34%2C+29%2C+38%2C+34%2C+51%2C+29%2C+34%2C+47%2C+34%2C+55%2C+94%2C+68%2C+81%5D http://www.wolframalpha.com/input/?i=linear+fit+%5B12%2C+34%2C+29%2C+38%2C+34%2C+51%2C+29%2C+34%2C +47%2C+34%2C+55%2C+94%2C+68%2C+81%5D

Update: If you want to implement a linear regression in Python, I recommend starting with the explanation at Mathworld:更新:如果你想在 Python 中实现线性回归,我建议从 Mathworld 的解释开始:

http://mathworld.wolfram.com/LeastSquaresFitting.html http://mathworld.wolfram.com/LeastSquaresFitting.html

It's a very straightforward explanation of the algorithm, and it practically writes itself.这是对算法的非常直接的解释,实际上它是自己写的。 In particular, you want to pay close attention to equations 16-21, 27, and 28.特别是,您要密切注意方程式 16-21、27 和 28。

Try writing the algorithm yourself, and if you have problems, you should open another question.尝试自己写算法,如果有问题,你应该另开一个问题。

You can find the OLS coefficient using numpy:您可以使用 numpy 找到 OLS 系数:

import numpy as np

y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]

x = []
x.append(range(len(y)))                 #Time variable
x.append([1 for ele in xrange(len(y))]) #This adds the intercept, use range in Python3

y = np.matrix(y).T
x = np.matrix(x).T

betas = ((x.T*x).I*x.T*y)

Results:结果:

>>> betas
matrix([[  4.32527473],  #coefficient on the time variable
        [ 17.6       ]]) #coefficient on the intercept

Since the coefficient on the trend variable is positive, observations in your variable are increasing over time.由于趋势变量的系数为正,因此变量中的观察值会随着时间的推移而增加。

You can use simply scipy library您可以简单地使用 scipy 库

from scipy.stats import linregress
data = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
x = np.arange(1,len(data)+1)
y=np.array(data)
res = linregress(x, y)
print(f'Equation: {res[0]:.3f} * t + {res[1]:.3f}, R^2: {res[2] ** 2:.2f} ')
res

Output: Output:

Equation: 4.325 * t + 13.275, R^2: 0.66 
LinregressResult(slope=4.325274725274725, intercept=13.274725274725277, rvalue=0.8096297800892154, pvalue=0.0004497809466484867, stderr=0.9051717124425395, intercept_stderr=7.707259409345618)

Compute the beta coefficient.计算贝塔系数。

y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
x = range(1,len(y)+1)

def var(X):
    S = 0.0
    SS = 0.0
    for x in X:
        S += x
        SS += x*x
    xbar = S/float(len(X))
    return (SS - len(X) * xbar * xbar) / (len(X) -1.0)

def cov(X,Y):
    n = len(X)
    xbar = sum(X) / n
    ybar = sum(Y) / n
    return sum([(x-xbar)*(y-ybar) for x,y in zip(X,Y)])/(n-1)


def beta(x,y):
    return cov(x,y)/var(x)

print beta(x,y) #4.34285714286

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM