简体   繁体   English

如何使用KerasClassifier将LSTM用于序列分类

[英]How to use LSTM for sequence classification using KerasClassifier

I have a binary classification problem where I need to predict the potential future trendy/popular products based on customer interactions during 2010-2015 . 我有一个binary classification问题,我需要根据2010-2015年之间的客户互动情况来预测潜在的未来流行/流行产品。

Currently, my dataset includes 1000 products and each product is labelled as 0 or 1 (ie binary classification). 目前,我的数据集包括1000 products ,每个产品都标记为01 (即二进制分类)。 The label was decided based on customer interactions during 2016-2018 . 标签是根据2016-2018年间客户的互动情况决定的。

I am calculating how centrality measures changed over time for each product during 2010-2015 as the features for my binary classification problem. 我正在计算2010-2015期间how centrality measures changed over time for each product以此作为我的二进制分类问题的特征。 For example, consider the below figure that shows how degree centrality changed over time for each product. 例如,考虑下图,该图显示了每种产品的degree centrality随时间变化的方式。

在此处输入图片说明

More specifically, I analyse the change of following centrality measures as the features for my binary classification problem. 更具体地说,我将以下centrality measures的变化分析为我的二进制分类问题的特征。

  • how degree centrality of each good changed from 2010-2016 (see the above figure) 从2010年到2016年,每种商品的degree centrality如何变化(请参见上图)
  • how betweenness centrality of each good changed from 2010-2016 从2010年到betweenness centrality每种商品的betweenness centrality如何变化
  • how closeness centrality of each good changed from 2010-2016 2010- closeness centrality每种商品的closeness centrality度如何变化
  • how eigenvector centrality of each good changed from 2010-2016 从2010-2016年每种商品的eigenvector centrality性如何变化

In a nutshell, my data looks as follows. 简而言之,我的数据如下所示。

product, change_of_degree_centrality, change_of_betweenness_centrality, change_of_closenss_centrality, change_of_eigenvector_centrality, Label
item_1, [1.2, 2.5, 3.7, 4.2, 5.6, 8.8], [8.8, 4.6, 3.2, 9.2, 7.8, 8.6], …, 1
item_2, [5.2, 4.5, 3.7, 2.2, 1.6, 0.8], [1.5, 0, 1.2, 1.9, 2.5, 1.2], …, 0
and so on.

I wanted to use deep learning model to solve my issue. 我想使用深度学习模型来解决我的问题。 When reading tutorials, I realised that LSTM suits my problem. 在阅读教程时,我意识到LSTM适合我的问题。

So, I am using the below mentioned model for my classification. 因此,我使用下面提到的模型进行分类。

model = Sequential()
model.add(LSTM(10, input_shape=(6,4))) #where 6 is length of centrality sequence and 4 is types of centrality (i.e. degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality)
model.add(Dense(32))
model.add(Dense(1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

Since, I have a small dataset I wanted to perform 10-fold cross-validation. 因为,我有一个小的数据集,所以我想执行10倍交叉验证。 So, I am using KerasClassifier as follows by following this tutorial . 因此,按照本教程的说明,我KerasClassifier以下方式使用KerasClassifier

print(features.shape) #(1000,6,4)
print(target.shape) #(1000) 

# Create function returning a compiled network
def create_network():
    model = Sequential()
    model.add(LSTM(10, input_shape=(6,4)))
    model.add(Dense(32))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])     

    return model

# Wrap Keras model so it can be used by scikit-learn
neural_network = KerasClassifier(build_fn=create_network, 
                                 epochs=10, 
                                 batch_size=100, 
                                 verbose=0)

print(cross_val_score(neural_network, features, target, cv=5))

However, I noted that it is wrong to use cross validation with LSTM (eg, this tutorial , this question ). 但是,我指出,将cross validation与LSTM一起使用是错误的(例如, 本教程该问题 )。

However, I am not clear if this is applicable to me as I am only doing a binary classification predition to identify products that would be trendy/popular in future (not a forecasting). 但是,我不清楚这是否适用于我,因为我只是在进行binary classification以识别将来会流行/流行的产品(而不是预测产品)。

I think the data in my problem setting is divided by point-wise in the cross-validation, but not time-wise . 我认为问题设置中的数据在交叉验证中按点进行划分,而不是按时间进行划分。

ie (point-wise) 即(逐点)

1st fold training:
item_1, item2, ........, item_799, item_800

1st fold testing:
item 801, ........, item_1000

not (time-wise) 不(按时间)

1st fold training:
2010, 2011, ........, 2015

1st fold testing:
2016, ........, 2018

Due to this fact, I am assuming that using cross validation is correct in my problem. 由于这个事实,我假设在我的问题中使用cross validation是正确的。

Please let me know a suitable way to use cross-validation according to my problem and dataset. 请让我知道根据我的问题和数据集使用交叉验证的合适方法。

NOTE: I am not limited to LSTM and happy to explore other models as well. 注意:我不仅限于LSTM,也乐于探索其他模型。

I am happy to provide more details if needed. 如果需要,我很乐意提供更多详细信息。

There are many types of cross validation similar to how there are many types of neural networks. 交叉验证的类型很多,类似于神经网络的类型很多。 In your case you are trying to use kfold cross validation. 在您的情况下,您尝试使用kfold交叉验证。

In the question you linked, it correctly states that kfold cross validation should not be used with time series data. 在您链接的问题中,它正确地指出kfold交叉验证不应与时间序列数据一起使用。 You can't accurately evaluate your model if you are training on data and then testing on data that occurred before the training data. 如果您正在对数据进行训练,然后对训练数据之前发生的数据进行测试,则无法准确评估模型。

However, other forms of cross validation (such as the mentioned sliding window or expanding window) will still work with your time series data. 但是,其他形式的交叉验证(例如提到的滑动窗口或扩展窗口)仍然可以与您的时间序列数据一起使用。 There is a function in sklearn that splits the data using the expanding window method. sklearn中有一个使用扩展窗口方法拆分数据的功能。 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html https://scikit-learn.org/stable/modules/generation/sklearn.model_selection.TimeSeriesSplit.html

With all that said, I am not sure if you are really using time series data. 综上所述,我不确定您是否真的在使用时间序列数据。 If you simply have the centrality scores for each year as a separate feature, then the order of your data does not matter since each item is only one data point (assuming that the scores of one item don't impact another). 如果只是简单地将每年的中心评分作为单独的功能,那么数据的顺序就无关紧要,因为每一项只是一个数据点(假设一项的分数不会影响另一项)。 In that case you can use kfold cross validation and other networks that work with iid data. 在这种情况下,您可以使用kfold交叉验证和其他可处理iid数据的网络。 You could even use non neural networks such as SVMs or decision trees. 您甚至可以使用非神经网络,例如SVM或决策树。

Maybe you misunderstand the concept ,the KerasClassifier is suite for LSTM 也许您误解了这个概念,KerasClassifier是LSTM的套件

base on those link you give , it just say the cross-valid not suite for time-series 根据您提供的链接,它只是说交叉验证不适合时间序列

row-grow 行增长

but LSTM is clomn grow n time series 但是LSTM是克隆的n个时间序列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM