简体   繁体   English

用 plotly 绘制最佳拟合线

[英]Plot best fit line with plotly

I am using plotly's python library to plot a scatter graph of time series data.我正在使用 plotly 的 python 库来绘制时间序列数据的散点图。 Eg data :例如数据:

2015-11-11    1
2015-11-12    2
2015-11-14    4
2015-11-15    2
2015-11-21    3
2015-11-22    2
2015-11-23    3

Code in python: python中的代码:

df = pandas.read_csv('~/Data.csv', parse_dates=["date"], header=0)
df = df.sort_values(by=['date'], ascending=[True])
trace = go.Scatter(
            x=df['date'],
            y=df['score'],
            mode='markers'
)
fig.append_trace(trace, 2, 2)  # It is a subplot
iplot(fig)

Once the scatter plot is plotted, I want to plot a best fit line over this.绘制散点图后,我想在其上绘制最佳拟合线。

Does plotly provide this programmatically? plotly 是否以编程方式提供? It does from the webapp , but I did not find any documentation about how to do it programmatically.它来自webapp ,但我没有找到任何有关如何以编程方式执行此操作的文档。 The line in the link is exactly what I want:链接中的行正是我想要的:

在此处输入图片说明

Your provided code snippet is missing a fig definition.您提供的代码片段缺少fig定义。 I prefer using plotly.graph_objs but the with setup below you can chose to show your figures using fig.show() or iplot(fig) .我更喜欢使用plotly.graph_objs但是下面的设置你可以选择使用fig.show()iplot(fig)来显示你的数字。 You won't be able to just include an argument and get a best fit line automaticaly , but you sure can get this programmatically .您将无法仅包含参数并自动获得最佳拟合线,但您肯定可以通过编程方式获得。 You'll just need to add a couple of lines to you original setup and you're good to go.您只需要在原始设置中添加几行就可以了。

Plot:阴谋:

在此处输入图片说明

Complete code with sample data:带有示例数据的完整代码:

import pandas as pd
import datetime
import statsmodels.api as sm
import plotly.graph_objs as go
from plotly.offline import iplot

# sample data
df=pd.DataFrame({'date': {0: '2015-11-11',
                      1: '2015-11-12',
                      2: '2015-11-14',
                      3: '2015-11-15',
                      4: '2015-11-21',
                      5: '2015-11-22',
                      6: '2015-11-23'},
                     'score': {0: 1, 1: 2, 2: 4, 3: 2, 4: 3, 5: 2, 6: 3}})

df = df.sort_values(by=['date'], ascending=[True])

# data for time series linear regression
df['timestamp']=pd.to_datetime(df['date'])
df['serialtime']=[(d-datetime.datetime(1970,1,1)).days for d in df['timestamp']]

x = sm.add_constant(df['serialtime'])
model = sm.OLS(df['score'], x).fit()
df['bestfit']=model.fittedvalues

# plotly setup
fig=go.Figure()

# source data
fig.add_trace(go.Scatter(x=df['date'],
                         y=df['score'],
                         mode='markers',
                         name = 'score')
             )

# regression data
fig.add_trace(go.Scatter(x=df['date'],
                         y=df['bestfit'],
                         mode='lines',
                         name='best fit',
                         line=dict(color='firebrick', width=2)
                        ))

iplot(fig)

Some details:一些细节:

Time series often present certain issues for linear OLS estimation.时间序列通常会为线性 OLS 估计带来某些问题。 The format of the dates themselves can be challenging, so in this case it would be tempting to use the index of your dataframe as an independent variable.日期本身的格式可能具有挑战性,因此在这种情况下,很容易将数据帧的索引用作自变量。 But since your dates are not continuous, simply replacing them with a continous series would result in erroneous regression coefficients.但是由于您的日期不是连续的,简单地用连续系列替换它们会导致错误的回归系数。 I often find it best to use a serialized integer array to represent time series data, meaning that each date is represented by an integer which in turn is the count ouf days from some epoch.我经常发现最好使用序列化的整数数组来表示时间序列数据,这意味着每个日期都由一个整数表示,而整数又是某个纪元的天数。 In this case 01.01.1970 .在这种情况下01.01.1970

And that's exactly what I'm doing here:这正是我在这里所做的:

df['timestamp']=df['datetime'] = pd.to_datetime(df['date'])
df['serialtime'] = [(d- datetime.datetime(1970,1,1)).days for d in df['timestamp']]

Here's a plot that illustrates the effects on your OLS estimates by using the wrong data:下面的图说明了使用错误数据对 OLS 估计的影响:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM