简体   繁体   English

如何在 x 轴上绘制带有日期时间的线性回归

[英]How to plot a linear regression with datetimes on the x-axis

My DataFrame object looks like我的 DataFrame 对象看起来像

            amount
date    
2014-01-06  1
2014-01-07  1
2014-01-08  4
2014-01-09  1
2014-01-14  1

I would like a sort of scatter plot with time along the x-axis, and amount on the y, with a line through the data to guide the viewer's eye.我想要一种散点图,时间沿着 x 轴,数量在 y 上,用一条穿过数据的线来引导观众的眼睛。 If I use the pandas plot df.plot(style="o") it's not quite right, because the line is not there.如果我使用熊猫图df.plot(style="o")它不太正确,因为线不在那里。 I would like something like the examples here .我想要类似这里的例子。

note: this has a lot in common with Ian Thompson's answer but the approach is different enough to have it be a separate answer.注意:这与 Ian Thompson 的答案有很多共同点,但该方法的不同之处足以让它成为一个单独的答案。 I use the DataFrame format provided in the question and avoid changing the index.我使用问题中提供的 DataFrame 格式并避免更改索引。

Seaborn and other libraries don't deal as well with datetime axes as you might like them to. Seaborn 和其他库不会像您希望的那样处理日期时间轴。 Here's how I'd work around it:这是我解决它的方法:

Start by adding a column of date ordinals首先添加一列日期序数

Seaborn will deal better with these than with dates. Seaborn 会比处理日期更好地处理这些问题。 This is a handy trick for doing all kind of mathy things with dates and libraries that don't love dates.这是一个方便的技巧,可以用不喜欢日期的日期和库来做各种数学事情。

from datetime import date

df['date_ordinal'] = pd.to_datetime(df['date']).apply(lambda date: date.toordinal())

带序数的数据框

Make a plot with the ordinals on the date axis用日期轴上的序数绘制一个图

ax = seaborn.regplot(
    data=df,
    x='date_ordinal',
    y='amount',
)
# Tighten up the axes for prettiness
ax.set_xlim(df['date_ordinal'].min() - 1, df['date_ordinal'].max() + 1)
ax.set_ylim(0, df['amount'].max() + 1)

Replace the ordinal X-axis labels with nice, readable dates用漂亮、可读的日期替换顺序 X 轴标签

ax.set_xlabel('date')
new_labels = [date.fromordinal(int(item)) for item in ax.get_xticks()]
ax.set_xticklabels(new_labels)

用回归线绘图

ta-daa!哒哒!

Since Seaborn has trouble with dates, I'm going to create a work-around.由于 Seaborn 在约会方面遇到问题,我将创建一个解决方法。 First, I'll make the Date column my index:首先,我将日期列作为我的索引:

# Make dataframe
df = pd.DataFrame({'amount' : [1,
                               1,
                               4,
                               1,
                               1]},
                  index = ['2014-01-06',
                           '2014-01-07',
                           '2014-01-08',
                           '2014-01-09',
                           '2014-01-14'])

Second, convert the index to pd.DatetimeIndex:其次,将索引转换为 pd.DatetimeIndex:

# Make index pd.DatetimeIndex
df.index = pd.DatetimeIndex(df.index)

And replace the original with it:并用它替换原来的:

# Make new index
idx = pd.date_range(df.index.min(), df.index.max())

Third, reindex with the new index (idx):第三,使用新索引(idx)重新索引:

# Replace original index with idx
df = df.reindex(index = idx)

This will produce a new dataframe with NaN values for the dates you don't have data:这将为您没有数据的日期生成一个具有 NaN 值的新数据框:

df编辑

Fourth, since Seaborn doesn't play nice with dates and regression lines I'll create a row count column that we can use as our x-axis:第四,由于 Seaborn 不能很好地处理日期和回归线,我将创建一个行数列,我们可以将其用作我们的 x 轴:

# Insert row count
df.insert(df.shape[1],
          'row_count',
          df.index.value_counts().sort_index().cumsum())

Fifth, we should now be able to plot a regression line using 'row_count' as our x variable and 'amount' as our y variable:第五,我们现在应该能够使用 'row_count' 作为我们的 x 变量和 'amount' 作为我们的 y 变量来绘制回归线:

# Plot regression using Seaborn
fig = sns.regplot(data = df, x = 'row_count', y = 'amount')

Sixth, if you would like the dates to be along the x-axis instead of the row_count you can set the x-tick labels to the index:第六,如果您希望日期沿着 x 轴而不是 row_count,您可以将 x-tick 标签设置为索引:

# Change x-ticks to dates
labels = [item.get_text() for item in fig.get_xticklabels()]

# Set labels for 1:10 because labels has 11 elements (0 is the left edge, 11 is the right
# edge) but our data only has 9 elements
labels[1:10] = df.index.date

# Set x-tick labels
fig.set_xticklabels(labels)

# Rotate the labels so you can read them
plt.xticks(rotation = 45)

# Change x-axis title
plt.xlabel('date')

plt.show();

情节编辑2

Hope this helps!希望这可以帮助!

  • The datetime dtype values must be converted to something like ordinal datetime dtype值必须转换为类似ordinal
  • This can be done by calculating the model with sklearn.linear_model.LinearRegression and then adding the regression line with matplotlib.pyplot.plot这可以通过使用sklearn.linear_model.LinearRegression计算模型,然后使用matplotlib.pyplot.plot添加回归线来完成
    • sns.lineplot(x=[x1_date, x2_date], y=[y1, y2], label='Linear Model', color='magenta') also works. sns.lineplot(x=[x1_date, x2_date], y=[y1, y2], label='Linear Model', color='magenta')也有效。
  • Tested in python 3.8.11 , pandas 1.3.2 , matplotlib 3.4.3 , sklearn 0.24.2python 3.8.11pandas 1.3.2matplotlib 3.4.3sklearn 0.24.2
import yfinance as yf  # for data
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# download the data
data = yf.download('aapl', '2019-01-02', '2021-01-01')

# add an ordinal column because sklearn doesn't work with datetimes
data['ordinal'] = data.index.map(pd.Timestamp.toordinal)

# create the model
model = LinearRegression()

# extract x and y from dataframe data
x = data[['ordinal']]
y = data[['Adj Close']]

# fit the mode
model.fit(x, y)

# print the slope and intercept if desired
print('intercept:', model.intercept_[0])
print('slope:', model.coef_[0][0])

# select x1 and x2 and get the corresponding date from the index
x1 = data.ordinal.min()
x1_date = data[data.ordinal.eq(x1)].index[0]
x2 = data.ordinal.max()
x2_date = data[data.ordinal.eq(x2)].index[0]

# calculate y1, given x1
y1 = model.predict(np.array([[x1]]))[0][0]

print('y1:', y1)

# calculate y2, given x2
y2 = model.predict(np.array([[x2]]))[0][0]

print('y2:', y2)

[out]:
intercept: -90078.45713565295
slope: 0.12225139598567565
y1: 28.279040945126326
y2: 117.40030861868581

Plot阴谋

ax1 = data.plot(y='Adj Close', c='k', figsize=(15, 6), grid=True, legend=False)
ax1.plot([x1_date, x2_date], [y1, y2], label='Linear Model', c='magenta')
ax1.legend()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM