简体繁体中英

BERT for time series classification

原文 2021-02-22 18:05:57 8 1 python/ deep-learning/ time-series/ bert-language-model/ huggingface-transformers

I'd like to train a transformer encoder (eg BERT) on time-series data for a task that can be modeled as classification. Let met briefly describe the data I'm using before talking about the issue I'm facing.

I'm working with 90 seconds windows, and I have access to 100 values for each second (ie 90 vectors of size 100). My goal is to predict a binary label (0 or 1) for each second (ie produce a final vector of 0s ans 1s of length 90).

My first idea was to model this as a multi-label classification problem, where I would use BERT to produce a vector of size 90 filled with numbers between 0 and 1 and regress using nn.BCELoss and the groundtruth label (y_true looks like [0,0,0,1,1,1,0,0,1,1,1,0...,0]). A simple analogy would be to consider each second as a word , and the 100 values I have access to as the corresponding word embedding . The goal is then to train BERT (from scratch) on these sequences of 100-dim embedding (all sequence lengths are the same: 90).

The problem: when dealing with textual inputs, we simply add the CLS and SEP tokens to the input sequences, and let the tokenizer and the model do the rest of the job. When training directly on embeddings, what should we do to account for CLS and SEP tokens?

One idea I had was to add a 100-dim embedding at position 0 standing for the CLS token, as well as a 100-dim embedding on position 90+1=91 standing for the SEP token. But I don't know what embeddings I should use for these two tokens. And I'm not sure that's a good solution either.

Any ideas?

(I tried asking this question on Huggingface forums but didn't get any response.)

1 answers

While HuggingFace is very good for NLP I would not recommend using it for any time series problem. With respect to tokens there is no reason to use CLS nor SEP s you don't need them. The simplest way would be to feed the model data in the format (batch_size, seq_len, n_features) then have it predict (batch_size, seq_len) in this case it would look like (batch_size, 90, 100) and return a tensor of shape (batch_size, 90). That is unless you think there are temporal dependencies between windows. In which case you could use a rolling historical window. Secondly I suggest you look at some papers that discuss transformer for time series.

If you are looking for time series libraries that include the transformer check out Flow Forecast or transformer time series prediction for actual examples of using the transformer for time series data.

Time Series Classification

Time Series Classification with WEASEL

Time series classification using CNN

Pandas time series Classification issue

Time series classification problem in python

BERT sentence classification

Keras-conv1d for Time series for imbalanced time series Classification

Understanding multivariate time series classification with Keras

Input Format for Mulivariate Time Series Binary Classification

Time Series Classification for each upcoming sample in Python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Time Series Classification Time Series Classification with WEASEL Time series classification using CNN Pandas time series Classification issue Time series classification problem in python BERT sentence classification Keras-conv1d for Time series for imbalanced time series Classification Understanding multivariate time series classification with Keras Input Format for Mulivariate Time Series Binary Classification Time Series Classification for each upcoming sample in Python

Related Tags

BERT for time series classification

Question

1 answers

solution1 7 2021-02-24 02:53:17

solution1
7 2021-02-24 02:53:17