趋势的最佳契合线

Question

I have the following data 我有以下数据

df = pd.DataFrame({ 
'region'  : ['a', 'a', 'a','a',' a','a','a', 's', 's','s','l','a','c','a', 'e','a','g', 'd','c','d','a','f','a','a','a'],
'month_number' : [5, 12, 3, 12, 3, 6,7,8,9,10,11,12,4,5,2,6,7,8,3, 4, 7, 6,7,8,8],
'score' : [2.5, 5, 3.5, 2.5, 5.5, 3.5,2,3.5,4,2,1.5,1,1.5,4,5.5,2,3,1,2,3.5,4,2,3.5,3,4]})

I want to calculate the mean of the score in a region and create its trend over the year, as last I want to have a line of best fit to see if the trend is rising or falling over time. 我想计算一个地区的分数平均值，并确定其一年中的趋势，最后，我想找到一条最合适的线，以查看趋势是否随着时间的推移而上升或下降。 (Not for predicted values, simply on the mean) （不是预测值，只是平均值）

I filtered a region 'a': 我过滤了一个区域“ a”：

filtered = df[(df['region'] == 'a')]

And created a trend: 并创造了一种趋势：

filtered.groupby(['month_number','region']).mean()['score'].unstack().plot(figsize=(10,6))

This give the following: 这给出了以下内容：

Now I am stuck at the part how to fit the best line over the trend. 现在，我将停留在如何适应趋势的最佳路线这一部分上。 My goal after all is to create a column with values of plus's or minus's indicating rising or falling trend in that region. 毕竟，我的目标是创建一个列，该列的正负值表示该区域的上升或下降趋势。 If there is any other approach to this, I would like to hear it. 如果有其他解决方法，我想听听。

Answer 1

You can do it using seaborn 's regression plot regplot as following. 您可以使用seaborn的回归图regplot进行以下操作。 The shaded region is the confidence interval. 阴影区域是置信区间。

import seaborn as sns
import pandas as pd

df = pd.DataFrame({ 
'region'  : ['a', 'a', 'a','a',' a','a','a', 's', 's','s','l','a','c','a', 'e','a','g', 'd','c','d','a','f','a','a','a'],
'month_number' : [5, 12, 3, 12, 3, 6,7,8,9,10,11,12,4,5,2,6,7,8,3, 4, 7, 6,7,8,8],
'score' : [2.5, 5, 3.5, 2.5, 5.5, 3.5,2,3.5,4,2,1.5,1,1.5,4,5.5,2,3,1,2,3.5,4,2,3.5,3,4]})

filtered = df[(df['region'] == 'a')]
df1 = filtered.groupby(['month_number','region']).mean()['score'].unstack()
sns.regplot(x=df1.index.tolist(), y=df1['a'], data=df1)

If you don't want the shaded confidence interval, you can use ci=0 as 如果您不想使用阴影置信区间，可以将ci=0用作

sns.regplot(x=df1.index.tolist(), y=df1['a'], data=df1, ci=0)

Answer 2

If you want to just plot the straight line fit, use Seaborn. 如果只想绘制直线拟合，请使用Seaborn。

However, if, you want to calculate the straight line fit for the data, use numpy.polyfit . 但是，如果要计算数据的直线拟合，请使用numpy.polyfit 。

import numpy as np
f1 = filtered.groupby('month_number').mean().reset_index()
x = f1.month_number.values
y = f1.score.values
m, c = np.polyfit(x, y, 1)

You have calculated the slope and the y-intercept for your point. 您已经为您的点计算了斜率和y轴截距。

You can calculate points above and below your positions as follows: 您可以按以下方式计算仓位上方和下方的点：

yHat = m*x + c
yError = y - yHat

For your new column, just use the error values: 对于新列，只需使用错误值：

f1['HiLo'] = [ ('+' if m else '-')  for m in yError>0]

You will get your plusses and minuses .. 您将得到自己的优缺点..

month_number     score HiLo
           3  3.500000    +
           5  3.250000    -
           6  2.750000    -
           7  3.166667    +
           8  3.500000    +
          12  2.833333    -

趋势的最佳契合线

问题描述

2 个解决方案

解决方案1
1 2019-03-14 16:35:16

解决方案2
1 已采纳 2019-03-14 16:47:47

趋势的最佳契合线

问题描述

2 个解决方案

解决方案1 1 2019-03-14 16:35:16

解决方案2 1 已采纳 2019-03-14 16:47:47

解决方案1
1 2019-03-14 16:35:16

解决方案2
1 已采纳 2019-03-14 16:47:47