简体   繁体   中英

How to handle multiple sequences in training Hidden Markov Model with hmmlearn?

This question is also on Cross-Validated SE

Introduction

I'm working with time series data describing power consumption of 5 devices. My goal is to train a best fitting Hidden Markov Model for each device and do classification (ie give power consumption series and tell which device it was) based on likelihood scores of particular models. Observations come from 7 days:

在此处输入图像描述

Measurements are not continuous though, on some days they cover whole day while on others just a part, say 7 hours, for example this is data for one of devices split into days:

在此处输入图像描述

I have a problem understaning how to pass training data (suppose we take first 5 days for that and remaining 2 gonna be eval and test subsets) to model from hmmlearn library. Now it's done this way:

    model = hmm.GaussianHMM(n_components = n_hidden_states, n_iter = n_iter, random_state = seed)
    model.fit(train_dtf[device_name].to_numpy().reshape(-1,1))

where train_dtf.head(1) is like this: is在此处输入图像描述

and contains observations with day_index from 0 to 4. What I understand from hmmlearn docs is that data passed to model.fit is always one array, if it isn't it must be concatenated before. I'm not sure if it makes sense.

Questions

  1. Isn't it misleading to model to assume that this is one time series? I mean, especially if periods of observations differ among days. I have an intuition that we should indicate it somehow that at some point new day starts and thus new pattern starts.Is my intuition right?

  2. If so, how can I handle it? My first idea is to train model for one day, save parameters, then retrain model on the next day but using saved params as starting point, so on till all training days are used. I'm not confident about this solution though, because I can't explain why it should work.

  3. Maybe someone could propose any other method that is better for this particular task? I'm going to try DTW for sure, but I'm wondering if there are some other tools.

The hmmlearn library allows you to give multiple sequences. The documentation for fit lets you pass multiple sequences; you just have to tell fit where they start.

lengths (array-like of integers, shape (n_sequences, )) – Lengths of the individual sequences in X . The sum of these should be n_samples .

Suppose you have hourly data for 2 days: you have 2 * 24 = 48 observations. The argument lengths would be [24,24] to indicate this information to the model. Time step 25 doesn't use information from time step 24; instead, it's initialized from startprob_ just as for time step 0.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM