如何在Sklearn中使用时间序列数据进行分类

Question

我有一个时间序列数据集，如下所示，其中为每个传感器记录了2个时间序列。 Label列描述传感器是否有故障（即0或1 ）。

sensor, time-series 1, time-series 2, Label
x1, [38, 38, 35, 33, 32], [18, 18, 12, 11, 09], 1
x2, [33, 32, 35, 36, 32], [13, 12, 15, 16, 12], 0
and so on ..

目前，我正在考虑两个时间序列的不同特征（例如，最小值，最大值，中位数，斜率等），并考虑将它们用于sklearn中的randomforest分类器中进行分类。

df = pd.read_csv(input_file)
X = df[[myfeatures]]
y = df['Label']

#Random Forest classifier
clf=RandomForestClassifier(random_state = 42, class_weight="balanced", criterion = 'gini', max_depth = 3, max_features = 'auto', n_estimators = 500)

k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)

output = cross_validate(clf, X, y, cv=k_fold, scoring = 'roc_auc', return_estimator =True)
for idx,estimator in enumerate(output['estimator']):
    print("Features sorted by their score for estimator {}:".format(idx))
    feature_temp_importances = pd.DataFrame(estimator.feature_importances_,
                                       index = mylist,
                                        columns=['importance']).sort_values('importance', ascending=False)
    print(feature_temp_importances)

但是，我的结果很低。 我想知道是否有可能将时间序列数据提供给random forest分类器。 例如，给x1特征为[38, 38, 35, 33, 32], [18, 18, 12, 11, 09] 。 如果有可能，我想知道如何在sklearn中做到这一点？

如果需要，我很乐意提供更多详细信息。

Answer 1

如果要将整个时间序列馈入模型，并使用该时间序列进行预测，则应尝试使用RNN。

如果您想继续使用sklearn，另一种选择是将滚动平均值或滚动std应用于时间序列，因此时间t的x会受到时间t-1处的x的影响，依此类推。 通过这种相关性，您将能够将每个点分类为特定的类，从而将与点的主要标签相对应的整个时间序列分类。

Answer 2

是的，您可以将整个时间序列数据用作分类器的功能。

为此，只需使用原始数据，将每个传感器的2个时间序列串联起来，然后将其输入分类器即可。

from sklearn.model_selection import StratifiedKFold, cross_validate
from sklearn.ensemble import RandomForestClassifier
import numpy as np

n_samples = 100

# generates 2 n_samples random time series with integer values from 0 to 100.
x1 = np.array([np.random.randint(0, 100, 5) for _ in range(n_samples)])
x2 = np.array([np.random.randint(0, 100, 5) for _ in range(n_samples)])

X = np.hstack((x1, x2))


# generates n_samples random binary labels.
y = np.random.randint(0, 2, n_samples)

#Random Forest classifier
clf=RandomForestClassifier(random_state = 42, class_weight="balanced", criterion = 'gini', max_depth = 3, max_features = 'auto', n_estimators = 500)

k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)

output = cross_validate(clf, X, y, cv=k_fold, scoring = 'roc_auc', return_estimator =True)

但是，您可能不希望将随机森林与这些功能一起使用。 看一下LSTM甚至一维CNN，它们可能更适合于将整个时间序列用作输入的这种方法。

如何在Sklearn中使用时间序列数据进行分类

问题描述

2 个解决方案

解决方案1
1 2019-08-06 07:45:12

解决方案2
1 已采纳 2019-08-06 07:55:28

如何在Sklearn中使用时间序列数据进行分类

问题描述

2 个解决方案

解决方案1 1 2019-08-06 07:45:12

解决方案2 1 已采纳 2019-08-06 07:55:28

解决方案1
1 2019-08-06 07:45:12

解决方案2
1 已采纳 2019-08-06 07:55:28