简体   繁体   中英

Time series forecasting in python with irregular data

I have a dataset that looks like this.

Name  Date      GPA
Anna  1/1/2020  3.234
Anna  3/1/2020  3.854
Anna  4/1/2020  3.367
Anna  7/1/2020  3.578 
Anna  11/1/2020 3.678
Anna  12/1/2020 3.856
Alex  1/1/2020  3.256
Alex  6/1/2020  3.567
Alex  10/1/2020 3.976
Tara  2/1/2020  3.156
Tara  3/1/2020  3.553
Tara  4/1/2020  3.785
Tara  5/1/2020  3.574
Tara  8/1/2020  3.866
Tara  10/1/2020 3.578
Tara  12/1/2020 3.788
Kyle  4/1/2020  3.156
Kyle  6/1/2020  3.467
Kyle  7/1/2020  3.589
Kyle  8/1/2020  3.789
Kyle  11/1/2020 3.976

I want to use this dataset to do some time series forecasting in python. The problem is that, this data is very inconsistent.

The time variable frequency is supposed to be by month… but as you can see, we are missing many months for each of the people in the dataset.

Are there any models/algorithms out there that support irregular time series data like this? If not, how would you “clean” this dataset and run it through a model?

I'm assuming the dataset needs to be cleaned somehow before run through an algorithm.

The desired result looks like this… i want to forecast the values 3 months into the next year.

Name  Date      FORECAST GPA
Anna  1/1/2021  3.466
Anna  2/1/2021  3.577
Anna  3/1/2021  3.556
Alex  1/1/2021  3.656
Alex  2/1/2021  3.547
Alex  3/1/2021  3.757
Tara  1/1/2021  3.646
Tara  2/1/2021  3.857
Tara  3/1/2021  3.544
Kyle  1/1/2021  3.466
Kyle  2/1/2021  3.968
Kyle  3/1/2021  3.756

I am fairly new to python and have attempted to try many methods, none successfully. Any help would be greatly appreciated.

With your data I would transform it to finally train a Random Forest on it. I explained myself. First you should encode your categorical column the names, you could use One Hot encoder this is the easy part. Then you can transform your datetime column by using CalendarFourier . Here a an exemple:

fourier = CalendarFourier(freq="M", order=3)
dp = DeterministicProcess(
    index=data.index.values,
    order=1,                     # trend (order 1 means linear)
    seasonal=True,               # weekly seasonality (indicators)
    additional_terms=[fourier],  # annual seasonality (fourier)
    drop=True,                   # drop terms to avoid collinearity
)

Then, you have a "classical" dataset and you can train a Random Forest on it. This idea is extracted from this serie on Kaggle . I would add that it will be hard to fill with linear regression or other because you have the "Name" column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM