简体   繁体   English

如何在python中使用LSTM进行序列标记?

[英]How to use LSTM for sequence labelling in python?

I want to build a classifier that provides labels given a time series of vectors. 我想构建一个分类器,它提供给定时间序列向量的标签。 I have the code for a static LSTM-based classifier, but I don't know how I can incorporate the time information: 我有一个基于静态LSTM的分类器的代码,但我不知道如何合并时间信息:

Training set: 训练集:

time   = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15,16,17,18]
f1     = [1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2]
f2     = [2, 1, 3, 2, 4, 2, 3, 1, 9, 2, 1, 2, 1, 6, 1, 8, 2, 2]
labels = [a, a, b, b, a, a, b, b, a, a, b, b, a, a, b, b, a, a]

Test set: 测试集:

time   = [1, 2, 3, 4, 5, 6]
f1     = [2, 2, 2, 1, 1, 1]
f2     = [2, 1, 2, 1, 6, 1]
labels = [?, ?, ?, ?, ?, ?]

Following this post , I implemented the following in pybrain: 这篇文章之后 ,我在pybrain中实现了以下内容:

from pybrain.datasets import SequentialDataSet
from itertools import cycle
import matplotlib.pyplot as plt
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure.modules import LSTMLayer
from pybrain.supervised import RPropMinusTrainer
from sys import stdout

data = [1,2,3,4,5,6,7]

ds = SequentialDataSet(1, 1)
for sample, next_sample in zip(data, cycle(data[1:])):
    ds.addSample(sample, next_sample)

print ds
net = buildNetwork(2, 5, 1, hiddenclass=LSTMLayer, outputbias=False, recurrent=True)


trainer = RPropMinusTrainer(net, dataset=ds)
train_errors = [] # save errors for plotting later
EPOCHS_PER_CYCLE = 5
CYCLES = 100
EPOCHS = EPOCHS_PER_CYCLE * CYCLES
for i in xrange(CYCLES):
    trainer.trainEpochs(EPOCHS_PER_CYCLE)
    train_errors.append(trainer.testOnData())
    epoch = (i+1) * EPOCHS_PER_CYCLE
    print("\r epoch {}/{}".format(epoch, EPOCHS))
    stdout.flush()

print()
print("final error =", train_errors[-1])

plt.plot(range(0, EPOCHS, EPOCHS_PER_CYCLE), train_errors)
plt.xlabel('epoch')
plt.ylabel('error')
plt.show()

for sample, target in ds.getSequenceIterator(0):
    print("               sample = %4.1f" % sample)
    print("predicted next sample = %4.1f" % net.activate(sample))
    print("   actual next sample = %4.1f" % target)
    print()

This trains a classifier, but I don't know how to incorporate the time information. 这会训练分类器,但我不知道如何合并时间信息。 How can I include the information about the order of the vectors? 如何包含有关向量顺序的信息?

This is how I implemented my sequence labeling. 这就是我实现序列标记的方法。 I have six classes of labels. 我有六类标签。 I have 20 sample sequence for each class. 每节课我有20个样本序列。 Each sequence consist of 100 timesteps of datapoints with 10 variables. 每个序列由100个时间步长的数据点和10个变量组成。

input_variable = 10
output_class = 1
trndata = SequenceClassificationDataSet(input_variable,output_label, nb_classes=6)

 # input 1st sequence into dataset for class label 0
 for i in range(100):
     trndata.appendLinked(sequence1_class0[i,:], [0])
 trndata.newSequence()

 # input 2nd sequence into dataset for class label 0
 for i in range(100):
     trndata.appendLinked(sequence2_class0[i,:], [0])
 trndata.newSequence()
 ......
 ......

 # input 20th sequence into dataset for class label 5
 for i in range(100):
     trndata.appendLinked(sequence20_class5[i,:], [5])
 trndata.newSequence()

You could put all of them in a for loop eventually. 你最终可以把所有这些都放在for循环中。 The trndata.newSequence() is called every time a new sample sequence is given as dataset. 每次将新的样本序列作为数据集给出时,都会调用trndata.newSequence()。

The training of the network should be similar to your existing code. 网络培训应与您现有的代码类似。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM