I have a dataset that looks like this.
Name Date GPA
Anna 1/1/2020 3.234
Anna 3/1/2020 3.854
Anna 4/1/2020 3.367
Anna 7/1/2020 3.578
Anna 11/1/2020 3.678
Anna 12/1/2020 3.856
Alex 1/1/2020 3.256
Alex 6/1/2020 3.567
Alex 10/1/2020 3.976
Tara 2/1/2020 3.156
Tara 3/1/2020 3.553
Tara 4/1/2020 3.785
Tara 5/1/2020 3.574
Tara 8/1/2020 3.866
Tara 10/1/2020 3.578
Tara 12/1/2020 3.788
Kyle 4/1/2020 3.156
Kyle 6/1/2020 3.467
Kyle 7/1/2020 3.589
Kyle 8/1/2020 3.789
Kyle 11/1/2020 3.976
I want to use this dataset to do some time series forecasting in python. The problem is that, this data is very inconsistent.
The time variable frequency is supposed to be by month… but as you can see, we are missing many months for each of the people in the dataset.
Are there any models/algorithms out there that support irregular time series data like this? If not, how would you “clean” this dataset and run it through a model?
I'm assuming the dataset needs to be cleaned somehow before run through an algorithm.
The desired result looks like this… i want to forecast the values 3 months into the next year.
Name Date FORECAST GPA
Anna 1/1/2021 3.466
Anna 2/1/2021 3.577
Anna 3/1/2021 3.556
Alex 1/1/2021 3.656
Alex 2/1/2021 3.547
Alex 3/1/2021 3.757
Tara 1/1/2021 3.646
Tara 2/1/2021 3.857
Tara 3/1/2021 3.544
Kyle 1/1/2021 3.466
Kyle 2/1/2021 3.968
Kyle 3/1/2021 3.756
I am fairly new to python and have attempted to try many methods, none successfully. Any help would be greatly appreciated.
With your data I would transform it to finally train a Random Forest on it. I explained myself. First you should encode your categorical column the names, you could use One Hot encoder this is the easy part. Then you can transform your datetime column by using CalendarFourier . Here a an exemple:
fourier = CalendarFourier(freq="M", order=3)
dp = DeterministicProcess(
index=data.index.values,
order=1, # trend (order 1 means linear)
seasonal=True, # weekly seasonality (indicators)
additional_terms=[fourier], # annual seasonality (fourier)
drop=True, # drop terms to avoid collinearity
)
Then, you have a "classical" dataset and you can train a Random Forest on it. This idea is extracted from this serie on Kaggle . I would add that it will be hard to fill with linear regression or other because you have the "Name" column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.