[英]How to get slope from timeseries data in pandas?
I have a pandas dataframe which contains date
and some values something like below我有一个包含
date
和一些值的熊猫数据框,如下所示
Original data:原始数据:
list = [('2018-10-29', 6.1925), ('2018-10-29', 6.195), ('2018-10-29', 1.95833333333333),
('2018-10-29', 1.785), ('2018-10-29', 3.05), ('2018-10-29', 1.30666666666667),
('2018-10-29', 1.6325), ('2018-10-30', 1.765), ('2018-10-30', 1.265),
('2018-10-30', 2.1125), ('2018-10-30', 2.16714285714286), ('2018-10-30', 1.485),
('2018-10-30', 1.72), ('2018-10-30', 2.754), ('2018-10-30', 1.79666666666667),
('2018-10-30', 1.27833333333333), ('2018-10-30', 3.48), ('2018-10-30', 6.19),
('2018-10-30', 6.235), ('2018-10-30', 6.11857142857143), ('2018-10-30', 6.088),
('2018-10-30', 4.3), ('2018-10-30', 7.80666666666667),
('2018-10-30', 7.78333333333333), ('2018-10-30', 10.9766666666667),
('2018-10-30', 2.19), ('2018-10-30', 1.88)]
After loading into pandas加载到pandas后
df = pd.DataFrame(list)
0 1
0 2018-10-29 6.192500
1 2018-10-29 6.195000
2 2018-10-29 1.958333
3 2018-10-29 1.785000
4 2018-10-29 3.050000
5 2018-10-29 1.306667
6 2018-10-29 1.632500
7 2018-10-30 1.765000
8 2018-10-30 1.265000
9 2018-10-30 2.112500
10 2018-10-30 2.167143
11 2018-10-30 1.485000
12 2018-10-30 1.720000
13 2018-10-30 2.754000
14 2018-10-30 1.796667
15 2018-10-30 1.278333
16 2018-10-30 3.480000
17 2018-10-30 6.190000
18 2018-10-30 6.235000
19 2018-10-30 6.118571
20 2018-10-30 6.088000
21 2018-10-30 4.300000
22 2018-10-30 7.806667
23 2018-10-30 7.783333
24 2018-10-30 10.976667
25 2018-10-30 2.190000
26 2018-10-30 1.880000
This is how I load up the dataframe这就是我加载数据框的方式
df = pd.DataFrame(list)
df[0] = pd.to_datetime(df[0], errors='coerce')
df.set_index(0, inplace=True)
Now I want to find the slope
.现在我想找到
slope
。 Upon research in the internet, I found this is what is needed to be done to get the slope
在互联网上研究后,我发现这是获得
slope
所需要做的
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear',axis=0).fillna(0).values)))
results = sm.OLS(np.asarray(sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear', axis=0).fillna(0).values), sm.add_constant(np.array([i for i in range(len(trend_coord))])), missing='drop').fit()
slope = results.params[1]
print(slope)
But I get the below error但我收到以下错误
Traceback (most recent call last):
File "/home/souvik/Music/UI_Server2/test35.py", line 11, in <module>
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear',axis=0).fillna(0).values)))
File "/home/souvik/django_test/webdev/lib/python3.5/site-packages/statsmodels/tsa/seasonal.py", line 127, in seasonal_decompose
raise ValueError("You must specify a freq or x must be a "
ValueError: You must specify a freq or x must be a pandas object with a timeseries index with a freq not set to None
Now if I add a freq
parameter to the seasonal_decompose method such as现在,如果我将
freq
参数添加到 season_decompose 方法中,例如
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1).trend.interpolate(method='linear',axis=0).fillna(0).values)))
Then I get an error like然后我得到一个错误
Traceback (most recent call last):
File "/home/souvik/Music/UI_Server2/test35.py", line 11, in <module>
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1).trend.interpolate(method='linear',axis=0).fillna(0).values)))
AttributeError: 'numpy.ndarray' object has no attribute 'interpolate'
However if I get rid of any fine graining of data such as interpolate
etc and do something like below但是,如果我摆脱了任何细粒度的数据(例如
interpolate
等)并执行以下操作
trend_coord = sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1, model='additive').trend
results = sm.OLS(np.asarray(trend_coord),
sm.add_constant(np.array([i for i in range(len(trend_coord))])), missing='drop').fit()
slope = results.params[1]
print(">>>>>>>>>>>>>>>> slope", slope)
Then I get a slope
value of 0.13668559218559242
.然后我得到
0.13668559218559242
的slope
值。
But I am not sure if this is the correct way to find out the slope
and if even the value is right.但我不确定这是否是找出
slope
的正确方法,甚至该值是否正确。
Is there a better way to find out slope
?有没有更好的方法来找出
slope
?
I will take part of Franco answer, but you don't need sklearn.我会参加佛朗哥的回答,但你不需要 sklearn。 You can easily do it with scipy.
您可以使用 scipy 轻松完成。
import datetime as dt
from scipy import stats
df = pd.DataFrame(list, columns=['date', 'value'])
df.date =pd.to_datetime(df.date)
df['date_ordinal'] = pd.to_datetime(df['date']).map(dt.datetime.toordinal)
slope, intercept, r_value, p_value, std_err = stats.linregress(df['date_ordinal'], df['value'])
slope
Out[114]: 0.80959404761905
You can use datetime.toordinal
to map each date to an integer and sklearn.linear_model
to fit a linear regression model on your data to obtain the slope like:您可以使用
datetime.toordinal
将每个日期映射到一个整数,并使用sklearn.linear_model
在您的数据上拟合线性回归模型以获得如下斜率:
import datetime as dt
from sklearn import linear_model
df = pd.DataFrame(list, columns=['date', 'value'])
df['date_ordinal'] = pd.to_datetime(df['date']).map(dt.datetime.toordinal)
reg = linear_model.LinearRegression()
reg.fit(df['date_ordinal'].values.reshape(-1, 1), df['value'].values)
reg.coef_
array([0.80959405])
To get the slope and intercept of a linear regression line (y = intercept + slope * x) for a simple case like this, you need to use numpy polyfit() method.要在这样的简单情况下获得线性回归线的斜率和截距(y = 截距 + 斜率 * x),您需要使用 numpy polyfit() 方法。 My explanation is inline with code below.
我的解释与下面的代码一致。
# You should only need numpy and pandas
import numpy as np
import pandas as pd
# Now your list
list = [('2018-10-29', 6.1925), ('2018-10-29', 6.195), ('2018-10-29', 1.95833333333333),
('2018-10-29', 1.785), ('2018-10-29', 3.05), ('2018-10-29', 1.30666666666667),
('2018-10-29', 1.6325), ('2018-10-30', 1.765), ('2018-10-30', 1.265),
('2018-10-30', 2.1125), ('2018-10-30', 2.16714285714286), ('2018-10-30', 1.485),
('2018-10-30', 1.72), ('2018-10-30', 2.754), ('2018-10-30', 1.79666666666667),
('2018-10-30', 1.27833333333333), ('2018-10-30', 3.48), ('2018-10-30', 6.19),
('2018-10-30', 6.235), ('2018-10-30', 6.11857142857143), ('2018-10-30', 6.088),
('2018-10-30', 4.3), ('2018-10-30', 7.80666666666667),
('2018-10-30', 7.78333333333333), ('2018-10-30', 10.9766666666667),
('2018-10-30', 2.19), ('2018-10-30', 1.88)]
# Create a single pandas DataFrame
df = pd.DataFrame(list)
# Make it into a Time Series with 'date' and 'value' columns
ts = pd.DataFrame(list, columns=['date', 'value'])
#print it to check it
ts.head(10)
# Now separate it into x and y lists
x = ts['date']
y = ts['value'].astype(float)
# Create a sequance of integers from 0 to x.size to use in np.polyfit() call
x_seq = np.arange(x.size) # should give you [ 0 1 2 3 4 ... 26]
# call numpy polyfit() method with x_seq, y
fit = np.polyfit(x_seq, y, 1)
fit_fn = np.poly1d(fit)
print('Slope = ', fit[0], ", ","Intercept = ", fit[1])
print(fit_fn)
Slope = 0.1366855921855925 , Intercept = 1.9827865961199274斜率 = 0.1366855921855925,截距 = 1.9827865961199274
0.1367 x + 1.983 0.1367 x + 1.983
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.