[英]How to use LSTM for sequence classification using KerasClassifier
I have a binary classification
problem where I need to predict the potential future trendy/popular products based on customer interactions during 2010-2015
. 我有一个
binary classification
问题,我需要根据2010-2015
年之间的客户互动情况来预测潜在的未来流行/流行产品。
Currently, my dataset includes 1000 products
and each product is labelled as 0
or 1
(ie binary classification). 目前,我的数据集包括
1000 products
,每个产品都标记为0
或1
(即二进制分类)。 The label was decided based on customer interactions during 2016-2018
. 标签是根据
2016-2018
年间客户的互动情况决定的。
I am calculating how centrality measures changed over time for each product
during 2010-2015
as the features for my binary classification problem. 我正在计算
2010-2015
期间how centrality measures changed over time for each product
以此作为我的二进制分类问题的特征。 For example, consider the below figure that shows how degree centrality
changed over time for each product. 例如,考虑下图,该图显示了每种产品的
degree centrality
随时间变化的方式。
More specifically, I analyse the change of following centrality measures
as the features for my binary classification problem. 更具体地说,我将以下
centrality measures
的变化分析为我的二进制分类问题的特征。
degree centrality
of each good changed from 2010-2016 (see the above figure) degree centrality
如何变化(请参见上图) betweenness centrality
of each good changed from 2010-2016 betweenness centrality
每种商品的betweenness centrality
如何变化 closeness centrality
of each good changed from 2010-2016 closeness centrality
每种商品的closeness centrality
度如何变化 eigenvector centrality
of each good changed from 2010-2016 eigenvector centrality
性如何变化 In a nutshell, my data looks as follows. 简而言之,我的数据如下所示。
product, change_of_degree_centrality, change_of_betweenness_centrality, change_of_closenss_centrality, change_of_eigenvector_centrality, Label
item_1, [1.2, 2.5, 3.7, 4.2, 5.6, 8.8], [8.8, 4.6, 3.2, 9.2, 7.8, 8.6], …, 1
item_2, [5.2, 4.5, 3.7, 2.2, 1.6, 0.8], [1.5, 0, 1.2, 1.9, 2.5, 1.2], …, 0
and so on.
I wanted to use deep learning model to solve my issue. 我想使用深度学习模型来解决我的问题。 When reading tutorials, I realised that
LSTM
suits my problem. 在阅读教程时,我意识到
LSTM
适合我的问题。
So, I am using the below mentioned model for my classification. 因此,我使用下面提到的模型进行分类。
model = Sequential()
model.add(LSTM(10, input_shape=(6,4))) #where 6 is length of centrality sequence and 4 is types of centrality (i.e. degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality)
model.add(Dense(32))
model.add(Dense(1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
Since, I have a small dataset I wanted to perform 10-fold cross-validation. 因为,我有一个小的数据集,所以我想执行10倍交叉验证。 So, I am using
KerasClassifier
as follows by following this tutorial . 因此,按照本教程的说明,我
KerasClassifier
以下方式使用KerasClassifier
。
print(features.shape) #(1000,6,4)
print(target.shape) #(1000)
# Create function returning a compiled network
def create_network():
model = Sequential()
model.add(LSTM(10, input_shape=(6,4)))
model.add(Dense(32))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# Wrap Keras model so it can be used by scikit-learn
neural_network = KerasClassifier(build_fn=create_network,
epochs=10,
batch_size=100,
verbose=0)
print(cross_val_score(neural_network, features, target, cv=5))
However, I noted that it is wrong to use cross validation
with LSTM (eg, this tutorial , this question ). 但是,我指出,将
cross validation
与LSTM一起使用是错误的(例如, 本教程 , 该问题 )。
However, I am not clear if this is applicable to me as I am only doing a binary classification
predition to identify products that would be trendy/popular in future (not a forecasting). 但是,我不清楚这是否适用于我,因为我只是在进行
binary classification
以识别将来会流行/流行的产品(而不是预测产品)。
I think the data in my problem setting is divided by point-wise in the cross-validation, but not time-wise . 我认为问题设置中的数据在交叉验证中按点进行划分,而不是按时间进行划分。
ie (point-wise) 即(逐点)
1st fold training:
item_1, item2, ........, item_799, item_800
1st fold testing:
item 801, ........, item_1000
not (time-wise) 不(按时间)
1st fold training:
2010, 2011, ........, 2015
1st fold testing:
2016, ........, 2018
Due to this fact, I am assuming that using cross validation
is correct in my problem. 由于这个事实,我假设在我的问题中使用
cross validation
是正确的。
Please let me know a suitable way to use cross-validation according to my problem and dataset. 请让我知道根据我的问题和数据集使用交叉验证的合适方法。
NOTE: I am not limited to LSTM and happy to explore other models as well. 注意:我不仅限于LSTM,也乐于探索其他模型。
I am happy to provide more details if needed. 如果需要,我很乐意提供更多详细信息。
There are many types of cross validation similar to how there are many types of neural networks. 交叉验证的类型很多,类似于神经网络的类型很多。 In your case you are trying to use kfold cross validation.
在您的情况下,您尝试使用kfold交叉验证。
In the question you linked, it correctly states that kfold cross validation should not be used with time series data. 在您链接的问题中,它正确地指出kfold交叉验证不应与时间序列数据一起使用。 You can't accurately evaluate your model if you are training on data and then testing on data that occurred before the training data.
如果您正在对数据进行训练,然后对训练数据之前发生的数据进行测试,则无法准确评估模型。
However, other forms of cross validation (such as the mentioned sliding window or expanding window) will still work with your time series data. 但是,其他形式的交叉验证(例如提到的滑动窗口或扩展窗口)仍然可以与您的时间序列数据一起使用。 There is a function in sklearn that splits the data using the expanding window method.
sklearn中有一个使用扩展窗口方法拆分数据的功能。 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
https://scikit-learn.org/stable/modules/generation/sklearn.model_selection.TimeSeriesSplit.html
With all that said, I am not sure if you are really using time series data. 综上所述,我不确定您是否真的在使用时间序列数据。 If you simply have the centrality scores for each year as a separate feature, then the order of your data does not matter since each item is only one data point (assuming that the scores of one item don't impact another).
如果只是简单地将每年的中心评分作为单独的功能,那么数据的顺序就无关紧要,因为每一项只是一个数据点(假设一项的分数不会影响另一项)。 In that case you can use kfold cross validation and other networks that work with iid data.
在这种情况下,您可以使用kfold交叉验证和其他可处理iid数据的网络。 You could even use non neural networks such as SVMs or decision trees.
您甚至可以使用非神经网络,例如SVM或决策树。
Maybe you misunderstand the concept ,the KerasClassifier is suite for LSTM 也许您误解了这个概念,KerasClassifier是LSTM的套件
base on those link you give , it just say the cross-valid not suite for time-series 根据您提供的链接,它只是说交叉验证不适合时间序列
row-grow 行增长
but LSTM is clomn grow n time series 但是LSTM是克隆的n个时间序列
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.