简体   繁体   English

趋势的最佳契合线

[英]Best fit line for trend

I have the following data 我有以下数据

df = pd.DataFrame({ 
'region'  : ['a', 'a', 'a','a',' a','a','a', 's', 's','s','l','a','c','a', 'e','a','g', 'd','c','d','a','f','a','a','a'],
'month_number' : [5, 12, 3, 12, 3, 6,7,8,9,10,11,12,4,5,2,6,7,8,3, 4, 7, 6,7,8,8],
'score' : [2.5, 5, 3.5, 2.5, 5.5, 3.5,2,3.5,4,2,1.5,1,1.5,4,5.5,2,3,1,2,3.5,4,2,3.5,3,4]})

I want to calculate the mean of the score in a region and create its trend over the year, as last I want to have a line of best fit to see if the trend is rising or falling over time. 我想计算一个地区的分数平均值,并确定其一年中的趋势,最后,我想找到一条最合适的线,以查看趋势是否随着时间的推移而上升或下降。 (Not for predicted values, simply on the mean) (不是预测值,只是平均值)

I filtered a region 'a': 我过滤了一个区域“ a”:

filtered = df[(df['region'] == 'a')]

And created a trend: 并创造了一种趋势:

filtered.groupby(['month_number','region']).mean()['score'].unstack().plot(figsize=(10,6))

This give the following: 这给出了以下内容: 在此处输入图片说明

Now I am stuck at the part how to fit the best line over the trend. 现在,我将停留在如何适应趋势的最佳路线这一部分上。 My goal after all is to create a column with values of plus's or minus's indicating rising or falling trend in that region. 毕竟,我的目标是创建一个列,该列的正负值表示该区域的上升或下降趋势。 If there is any other approach to this, I would like to hear it. 如果有其他解决方法,我想听听。

You can do it using seaborn 's regression plot regplot as following. 您可以使用seaborn回归图regplot进行以下操作。 The shaded region is the confidence interval. 阴影区域是置信区间。

import seaborn as sns
import pandas as pd

df = pd.DataFrame({ 
'region'  : ['a', 'a', 'a','a',' a','a','a', 's', 's','s','l','a','c','a', 'e','a','g', 'd','c','d','a','f','a','a','a'],
'month_number' : [5, 12, 3, 12, 3, 6,7,8,9,10,11,12,4,5,2,6,7,8,3, 4, 7, 6,7,8,8],
'score' : [2.5, 5, 3.5, 2.5, 5.5, 3.5,2,3.5,4,2,1.5,1,1.5,4,5.5,2,3,1,2,3.5,4,2,3.5,3,4]})

filtered = df[(df['region'] == 'a')]
df1 = filtered.groupby(['month_number','region']).mean()['score'].unstack()
sns.regplot(x=df1.index.tolist(), y=df1['a'], data=df1)

在此处输入图片说明

If you don't want the shaded confidence interval, you can use ci=0 as 如果您不想使用阴影置信区间,可以将ci=0用作

sns.regplot(x=df1.index.tolist(), y=df1['a'], data=df1, ci=0)

在此处输入图片说明

If you want to just plot the straight line fit, use Seaborn. 如果只想绘制直线拟合,请使用Seaborn。

However, if, you want to calculate the straight line fit for the data, use numpy.polyfit . 但是,如果要计算数据的直线拟合,请使用numpy.polyfit

import numpy as np
f1 = filtered.groupby('month_number').mean().reset_index()
x = f1.month_number.values
y = f1.score.values
m, c = np.polyfit(x, y, 1)

You have calculated the slope and the y-intercept for your point. 您已经为您的点计算了斜率和y轴截距。

You can calculate points above and below your positions as follows: 您可以按以下方式计算仓位上方和下方的点:

yHat = m*x + c
yError = y - yHat

For your new column, just use the error values: 对于新列,只需使用错误值:

f1['HiLo'] = [ ('+' if m else '-')  for m in yError>0]

You will get your plusses and minuses .. 您将得到自己的优缺点..

month_number     score HiLo
           3  3.500000    +
           5  3.250000    -
           6  2.750000    -
           7  3.166667    +
           8  3.500000    +
          12  2.833333    -

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM