简体   繁体   中英

Deep learning training the dataset which has gap

I have a dataset of a sensor (station) for several years with this structure:

station Direction   year    month   day dayOfweek   hour    volume
  1009       3      2015      1      1      5        0        37
  1009       3      2015      1      1      5        1        20
  1009       3      2015      1      1      5        2        24
   ...       .       ..       ..     ..     ..       ..       ..

there is plenty of gap(missed value) in this data. For example there might be a month or several days missed. I fill the missed volumes with 0. I want to predict volume based on previous data. I used LSTM and the mean absolute percent error (MAPE) is quite high around 20 and I need to reduce it.

The main problem that I have is even for traning I have a gap. Is there any other techniqe in deep learning for these kind of data?

There are multiple ways to handle missing values as listed here ( https://machinelearningmastery.com/handle-missing-data-python/ ).

If i have enough data I will just ommit rows with missing data. If i do not have enough data and/or have to predict on cases where data is missing I normally try those two approaches and choose the one with the higher accuracy.

The same as you. I choose a distinct value which is not included in the dataset, like 0 in your case and fill in that value. The other approach is to use the mean or median of the training set. I use the same value (calculated on training set) in my validation set/test set. The median is better than the mean, if the mean does not make sense in the current context. (2014.5 as year for example).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM