簡體   English   中英

帶有日期數據的 Sklearn 線性回歸

[英]Sklearn Linear Regression with Date Data

我在將日期數據輸入 sklearn 線性回歸函數時遇到了一些麻煩。 我知道我需要將日期數據轉換為某種形式的序數,但我對 python 不夠熟悉如何這樣做! 這是我所擁有的

import matplotlib.pyplot as plt
import numpy as np

from sklearn import linear_model

data_time = np.asarray(['2017-05-24','2017-05-25','2017-05-26','2017-05-27','2017-05-28','2017-05-29','2017-05-30','2017-05-31','2017-06-01','2017-06-02','2017-06-03','2017-06-04','2017-06-05','2017-06-06','2017-06-07','2017-06-08','2017-06-09','2017-06-10','2017-06-11','2017-06-12','2017-06-13','2017-06-14','2017-06-15','2017-06-16','2017-06-17','2017-06-18','2017-06-19','2017-06-20','2017-06-21']).reshape(-1, 1)
data_count = np.asarray([300.000,301.000,302.000,303.000,304.000,305.000,306.000,307.000,308.000,309.000,310.000,311.000,312.000,230.367,269.032,258.867,221.645,222.323,212.357,198.516,230.133,243.903,244.320,207.451,192.710,212.033,216.677,222.333,208.710]).reshape(-1, 1)

regr = linear_model.LinearRegression()
regr.fit(data_time, data_count)

# Make predictions using the testing set
y_pred = regr.predict(data_time)

plt.title('My Title')
plt.xlabel('Date')
plt.ylabel('Metric')

plt.scatter(data_time, data_count,  color='black')
plt.plot(data_time, y_pred, color='orange', linewidth=3)

plt.show()

自然這會得到錯誤

ValueError: could not convert string to float: '2017-05-24'

任何幫助表示贊賞! 旁注:如果可能的話,我不想偏離使用這種 numpy 數組格式,因為我已經編寫了一個 C++ GUI 包裝器,它在后台生成 python 代碼。

您可以使用 pandas ( pd.to_datetime() ) 進行日期轉換,如下所示:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn import linear_model

data_time = np.asarray(['2017-05-24', '2017-05-25', '2017-05-26',
                        '2017-05-27', '2017-05-28', '2017-05-29',
                        '2017-05-30', '2017-05-31', '2017-06-01',
                        '2017-06-02', '2017-06-03', '2017-06-04',
                        '2017-06-05', '2017-06-06', '2017-06-07',
                        '2017-06-08', '2017-06-09', '2017-06-10',
                        '2017-06-11', '2017-06-12', '2017-06-13',
                        '2017-06-14', '2017-06-15', '2017-06-16',
                        '2017-06-17', '2017-06-18', '2017-06-19',
                        '2017-06-20', '2017-06-21'])
data_count = np.asarray([300.000, 301.000, 302.000, 303.000, 304.000,
                         305.000, 306.000, 307.000, 308.000, 309.000,
                         310.000, 311.000, 312.000, 230.367, 269.032,
                         258.867, 221.645, 222.323, 212.357, 198.516,
                         230.133, 243.903, 244.320, 207.451, 192.710,
                         212.033, 216.677, 222.333, 208.710])

df = pd.DataFrame({'time': data_time, 'count': data_count})
df.time = pd.to_datetime(df.time)

regr = linear_model.LinearRegression()
regr.fit(df.time.values.reshape(-1, 1), df['count'].reshape(-1, 1))

# Make predictions using the testing set
y_pred = regr.predict(df.time.values.astype(float).reshape(-1, 1))
df['pred'] = y_pred

ax = df.plot(x='time', y='count', color='black', style='.')
df.plot(x='time', y='pred', color='orange', linewidth=3, ax=ax, alpha=0.5)
ax.set_title('My Title')
ax.set_xlabel('Date')
ax.set_ylabel('Metric')

plt.show()

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM