简体   繁体   中英

Trend “Predictor” in Python?

I'm currently working with data frames (in pandas ) that have 2 columns: the first column is some numeric quantitative data, like weight, amount of money spent on some day, GPA, etc., and the second column are date values, ie the date on which the corresponding column 1 entry was added on.

I was wondering, is there a way to "predict" what the next value after time X is going to be in Python? Eg if I have 100 weight entries spanning over 2-3 months (not all entries have the same time difference, so 1 entry could be during Day 3, the next Day 5, and the next Day 10), and wanted to "predict" what my next entry after 1 month, is there a way to do that?

I think this has something to do with Time Series Analysis, but my statistical background isn't very strong, so I don't know if that's the right approach. If it is, how could I apply it to my data frames (ie which packages)? Would there be any significance to the value it potentially returns, or would it be meaningless in the context of what I'm working with? Thank you.

For predicting time-series data, I feel the best choice would be a LSTM, which is a type of recurrent neural network, which are well suited for time-series regression.

If you don't want to dive deep into the backend of neural networks, I suggest using the Keras library, which is a wrapper for the Tensorflow framework.

Lets say you have a 1-D array of values and you want to predict the next value. Code in Keras could look like:

#start off by building the training data, let arr = the list of values
X = []
y = []
for i in range(len(arr)-100-1):
    X.append(arr[i:i+100]) #get prev 100 values for the X
    y.append(arr[i+100])   # predict next value for Y

Since an LSTM takes a 3-D input, we want to reshape our X data to have 3 dimensions:

import numpy as np
X = np.array(X)
X = X.reshape(len(X), len(X[0]), 1)

Now X is in the form (samples, timesteps, features)

Here we can build a neural network using keras:

from keras.models import Sequential
from keras.layers import Dense, LSTM

model = Sequential()
model.add(LSTM(input_shape = (len(X[0], 1)) #input 3-D timeseries data
model.add(Dense(1)) #output 1-D vector of predicted values
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y)

And viola, you can use your model to predict the next values in your data

Statsmodels is a python module that provides one of the "most famous" methods in time series forecasting (Arima).

An example can be seen in the following link : https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/

Other methods for time series forecasting are available in some libraries, like support vector regression, Holt-Winters and Simple Exponential Smoothing.

Spark-ts ( https://github.com/sryza/spark-timeseries ) is one time series library that supports Python , and provides methods like Arima, Holt-Winters and Exponential Weighted Moving Average.

Libsvm ( https://github.com/cjlin1/libsvm ) provides support vector regression methods.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM