![](/img/trans.png)
[英]What is the difference between a “normal” k-fold cross-validation using shuffle=True and a repeated k-fold cross-validation?
[英]Hyperparameter tuning in keras using nested k-fold cross-validation
RandomizedSearchcv 只接受一维目标变量,但对于这个二元分类,我需要将 y_train 和 y_test 转换为 one-hot 变量到 keras。 我收到错误“支持的目标类型是:('binary', 'multiclass')。 取而代之的是'multilabel-indicator'。 谁能给我一些提示? 非常感谢!
def create_baseline():
model = Sequential()
model.add(Reshape((TIME_PERIODS, num_sensors), input_shape=(input_shape,)))
model.add(Conv1D(100, 6, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
#model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(MaxPooling1D(3))
model.add(Conv1D(100, 6, activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(3))
# LSTM
model.add(LSTM(64,return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(32,return_sequences=True))
model.add(Dropout(0.5))
model.add(Dense(128, activation="sigmoid", kernel_initializer="uniform"))
model.add(Dropout(0.5))
model.add(GlobalAveragePooling1D())
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
return model
from sklearn.model_selection import StratifiedKFold,KFold
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
seed=42
#y_train = np_utils.to_categorical(y_train, num_classes)
estimator = KerasClassifier(build_fn=create_baseline, epochs=30, batch_size=800, verbose=1)
# Nested k-fold cross-validation (Subject_dependent)
from sklearn.model_selection import GridSearchCV,cross_val_score, StratifiedKFold,RandomizedSearchCV
#train/validation/test=0.8/0.2/0.2
inner_cv = StratifiedKFold(n_splits = 4,shuffle=True,random_state=42)
outer_cv = StratifiedKFold(n_splits = 5,shuffle=True,random_state=42)
accuracy=[]
p_grid=[]
estimators=[]
#p_grid={'batch_size':[400,800]}
from sklearn.preprocessing import LabelEncoder
#def get_new_labels(y):
#y = LabelEncoder().fit_transform([''.join(str(l)) for l in y])
#return y
#y = get_new_labels(y)
for train_index, test_index in outer_cv.split(x,y):
print('Train Index:',train_index,'\n')
print('Test Index:',test_index)
x_train, x_test = x[train_index], x[test_index]
y_train, y_test = y[train_index], y[test_index]
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
grid = RandomizedSearchCV(estimator=estimator,
param_distributions=p_grid,
cv=inner_cv,
refit='roc_auc_scorer',
return_train_score=True,
verbose=1,n_jobs=-1,n_iter=20)
grid.fit(x_train, y_train)
estimators.append(grid.best_estimator_)
prediction = grid.predict(x_test)
accuracy.append(grid.score(x_test,y_test))
print('Accuracy:{}'.format(accuracy))
在二进制分类中,它要么是狗,要么不是狗,您的编码标签只是 1 或 0:
[[0] <- single row label
[1] <- single row label
[0]]
So your encoded labels look like:在多类分类中,它可以是狗、猫或鸟,并且不超过一个,即它们是所以你的编码标签看起来像:
[[0,0,1] <-- a single rows encoded label
[1,0,0] <-- another rows encoded label
[0,1,0]]
多标签分类不同,它接受不互斥的 label 集,即它可以是建筑物,也可以是房屋,也可以是办公室,即:
[[1,1,1]
[0,0,1]
[1,0,1]]
这里的问题是,您似乎正在将多标签标签传递给您的分类器——您应该仔细检查您的标签,并确保每行训练数据只有 1 或 0(如果这是您需要的)。
使用to_categorical
进行二进制分类很好,但是您可能需要仔细检查num_classes=2
进行二进制分类。
此外,如果它是一个二元分类问题,您的最终密集层激活需要是“sigmoid”而不是“softmax”。 请参阅此处了解注释。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.