简体   繁体   中英

predicting y values of time series data in python using linear regression

I want to predict the Y values which represents # of A-type clients/ time using linear regression, where X values are time series data.

the code is

 df1 = pd.DataFrame({'time': past_time_array, 'A_clients': client_A_array})
        x_a = np.arange(len(past_time_array))
        fit_A = np.polyfit(x_a, df1['A_clients'], 1)
        fit_fn_A = np.poly1d(fit_A)


        print df1
        print "fitness function = %s" %fit_fn_A

result for print df1 is

   A_clients                time
0           0 2018-02-09 14:45:00
1           0 2018-02-09 14:46:00
2           1 2018-02-09 14:47:00
3           4 2018-02-09 14:48:00
4           4 2018-02-09 14:49:00
5           2 2018-02-09 14:50:00
6           2 2018-02-09 14:51:00
7           2 2018-02-09 14:52:00
8           2 2018-02-09 14:53:00
9           4 2018-02-09 14:54:00
10          1 2018-02-09 14:55:00
11          3 2018-02-09 14:56:00
12          4 2018-02-09 14:57:00
13          2 2018-02-09 14:58:00
14          4 2018-02-09 14:59:00
15          3 2018-02-09 15:00:00
16          1 2018-02-09 15:01:00
17          1 2018-02-09 15:02:00
18          0 2018-02-09 15:03:00
19          4 2018-02-09 15:04:00
20          1 2018-02-09 15:05:00
21          1 2018-02-09 15:06:00
22          4 2018-02-09 15:07:00
23          4 2018-02-09 15:08:00

result for print "fitness function = %s" %fit_fn_A is

0.0001389 x + 2.213

Issue is that when I try to predict values like

predicted_ta = fit_fn_A(x_a[10])
print "predicted values = %f"%predicted_ta

it always gives me 2.213 which is c value of y = mx+c

Best fit line is shown below

在此处输入图片说明

Edit 1

Regression line has some slope when I count #clietns every 2 mns instead of one

在此处输入图片说明

Values were getting predicted right, but earlier as I was calculating number of clients/ minute and that graph is linear as shown above. So when I computed regression line for the number of clients/ 2 minutes the fitness function gave the correct result.

You can not apply his model here. There is no dependence at all.

Try to calculate summarized number of clients (value[x] = sum(value[: x]). Usually it fits pretty good with log() model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM