如何使用smac进行卷积神经网络的超参数优化？

Question

Note: Long Post. 注意：长帖子。 Please bear with me 请多多包涵

I have implemented a convolution neural network in PyTorch on KMNIST dataset. 我已经在KMNIST数据集的PyTorch中实现了卷积神经网络。 I need to use SMAC to optimize the learning rate and the momentum of Stochastic Gradient Descent of the CNN. 我需要使用SMAC优化CNN的学习率和随机梯度下降动量。 I am new in hyperparameter optimization and what I learnt from the smac documentation is, 我是超参数优化的新手，从smac文档中学到的是，

SMAC evaluates the algorithm to be optimized by invoking it through a Target Algorithm Evaluator (TAE). SMAC通过目标算法评估器（TAE）调用算法来评估要优化的算法。
We need a Scenario-object to configure the optimization process. 我们需要一个方案对象来配置优化过程。
run_obj parameter in Scenario object specifies what SMAC is supposed to optimize. Scenario对象中的run_obj参数指定SMAC应该优化的内容。

My Ultimate goal is to get a good accuracy or low loss 我的终极目标是获得良好的准确性或低损失

This is what I have done so far: 到目前为止，这是我所做的：

Convolution Neural Network 卷积神经网络

import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms 
import torchvision.datasets as datasets
from torch.autograd import Variable
from datasets import *
import torch.utils.data
import torch.nn.functional as F
import matplotlib.pyplot as plt

# Create the model class

class CNN(nn.Module):
    def __init__(self):

        super(CNN, self).__init__() # to inherent the features of nn.Module

        self.cnn1 = nn.Conv2d(in_channels = 1, out_channels = 8, kernel_size = 3, stride = 1, padding =1)

        # in_channels =1 because of grey scale image
        # kernel_size = feature_size
        # padding = 1 because for same padding = [(filter_size -1)/2]
        # the output size of the 8 feature maps is [(input_size - filter_size +2(padding)/stride)+1]

        #Batch Normalization

        self.batchnorm1 = nn.BatchNorm2d(8)

        # RELU

        self.relu = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size =2)

        # After maxpooling, the output of each feature map is 28/2 =14

        self.cnn2 = nn.Conv2d(in_channels = 8, out_channels = 32, kernel_size = 5, stride = 1, padding =2)

        #Batch Normalization

        self.batchnorm2 = nn.BatchNorm2d(32)

        # RELU

        #self.relu = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size =2)

        # After maxpooling , the output of each feature map is 14/2 =7of them is of size 7x7 --> 32*7*7=1568
        # Flatten the feature maps. You have 32 feature maps, each 
        self.fc1 = nn.Linear(in_features=1568, out_features = 600)
        self.dropout = nn.Dropout(p=0.5)
        self.fc2 = nn.Linear(in_features=600, out_features = 10)

    def forward(self,x):

        out = self.cnn1(x)
        #out = F.relu(self.cnn1(x))
        out = self.batchnorm1(out)
        out = self.relu(out)
        out = self.maxpool1(out)

        out = self.cnn2(out)
        out = self.batchnorm2(out)
        out = self.relu(out)
        out = self.maxpool2(out)

        #Now we have to flatten the output. This is where we apply the feed forward neural network as learned
        #before!

        #It will the take the shape (batch_size, 1568) = (100, 1568)

        out = out.view(-1, 1568)

        #Then we forward through our fully connected layer

        out = self.fc1(out)
        out = self.relu(out)
        out = self.dropout(out)
        out = self.fc2(out)

        return out

def train(model, train_loader, optimizer, epoch, CUDA, loss_fn):
        model.train()
        cum_loss=0
        iter_count = 0

        for i, (images, labels) in enumerate(train_load):

            if CUDA:

               images = Variable(images.cuda())
               images = images.unsqueeze(1)
               images = images.type(torch.FloatTensor)
               images = images.cuda()

               labels = Variable(labels.cuda())
               labels = labels.type(torch.LongTensor)
               labels = labels.cuda()

            else:

               images = Variable(images)
               images = images.unsqueeze(1)
               images = images.type(torch.DoubleTensor)

               labels = Variable(labels)
               labels = labels.type(torch.DoubleTensor)

            optimizer.zero_grad()
            outputs = model(images)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()
            cum_loss += loss


            if (i+1) % batch_size == 0:
               correct = 0
               total = 0
               acc = 0
               _, predicted = torch.max(outputs.data,1)
               total += labels.size(0)
               if CUDA:
                  correct += (predicted.cpu()==labels.cpu()).sum()
               else:
                  correct += (predicted==labels).sum()

               accuracy = 100*correct/total

            if i % len(train_load) == 0:

               iter_count += 1
               ave_loss = cum_loss/batch_size
        return ave_loss

batch_size = 100 
epochs = 5
e = range(epochs)
#print(e)

#Load datasets

variable_name=KMNIST()

train_images = variable_name.images
train_images = torch.from_numpy(train_images)

#print(train_images.shape)
#print(type(train_images))

train_labels = variable_name.labels
train_labels = torch.from_numpy(train_labels)

#print(train_labels.shape)
#print(type(train_labels))

train_dataset = torch.utils.data.TensorDataset(train_images, train_labels)

# Make the dataset iterable

train_load = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)

print('There are {} images in the training set' .format(len(train_dataset)))
print('There are {} images in the loaded training set' .format(len(train_load)))



def net(learning_rate, Momentum):
    model = CNN()
    CUDA = torch.cuda.is_available()
    if CUDA:
        model = model.cuda()

    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate,momentum = Momentum, nesterov= True)

    iteration = 0
    total_loss=[]

    for epoch in range(epochs):
        ave_loss = train(model, train_load, optimizer, epoch, CUDA, loss_fn)

        total_loss.append(ave_loss)

    return optimizer, loss_fn, model, total_loss

optimizer, loss_fn, model, total_loss = net(learning_rate= 0.01, Momentum = 0.09)

# Print model's state_dict

print("---------------")

print("Model's state_dict:")

for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("---------------")

#print("Optimizer's state_dict:")

#for var_name in optimizer.state_dict():
 #   print(var_name, "\t", optimizer.state_dict()[var_name])

torch.save(model.state_dict(), "kmnist_cnn.pt")

plt.plot(e, (np.array(total_loss)))
plt.xlabel("# Epoch")
plt.ylabel("Loss")
plt.show()

print('Done!')

smac hyperparameter optimization : smac超参数优化 ：

from smac.configspace import ConfigurationSpace
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
    UniformFloatHyperparameter, UniformIntegerHyperparameter

from smac.configspace.util import convert_configurations_to_array
#from ConfigSpace.conditions import InCondition

# Import SMAC-utilities
from smac.tae.execute_func import ExecuteTAFuncDict
from smac.scenario.scenario import Scenario
from smac.facade.smac_facade import SMAC

# Build Configuration Space which defines all parameters and their ranges
cs = ConfigurationSpace()

# We define a few possible types of SVM-kernels and add them as "kernel" to our cs

lr = UniformFloatHyperparameter('learning_rate', 1e-4, 1e-1, default_value='1e-2')
momentum = UniformFloatHyperparameter('Momentum', 0.01, 0.1, default_value='0.09')

cs.add_hyperparameters([lr, momentum])

def kmnist_from_cfg(cfg):

    cfg = {k : cfg[k] for k in cfg if cfg[k]}
    print('Config is', cfg)

    #optimizer, loss_fn, model, total_loss = net(**cfg)
    #optimizer, loss_fn, model, total_loss = net(learning_rate= cfg["learning_rate"], Momentum= cfg["Momentum"])

    optimizer, loss_fn, model, total_loss = net(learning_rate= 0.02, Momentum= 0.05)

    return optimizer, loss_fn, model, total_loss

# Scenario object
scenario = Scenario({"run_obj": "quality",   # we optimize quality (alternatively runtime)
                     "runcount-limit": 200,  # maximum function evaluations
                     "cs": cs,               # configuration space
                     "deterministic": "true"
                     })

#def_value = kmnist_from_cfg(cs.get_default_configuration())
#print("Default Value: %.2f" % (def_value))


# Optimize, using a SMAC-object

print("Optimizing! Depending on your machine, this might take a few minutes.")
smac = SMAC(scenario=scenario,tae_runner=kmnist_from_cfg) #rng=np.random.RandomState(42)
smac.solver.intensifier.tae_runner.use_pynisher = False

print("SMAC", smac)
incumbent = smac.optimize()


inc_value = kmnist_from_cfg(incumbent)

print("Optimized Value: %.2f" % (inc_value))

When I give loss as the run_obj parameter, I get the error message 当我将损失作为run_obj参数时，会收到错误消息

ArgumentError: argument --run-obj/--run_obj: invalid choice: 'total_loss' (choose from 'runtime', 'quality') ArgumentError：参数--run-obj /-run_obj：无效选择：'total_loss'（从'runtime'，'quality'中选择）

To be honest, I do not know what does "quality" means. 老实说，我不知道“质量”是什么意思。 Anyways, when I give quality as the run_obj parameter, I get the error message 无论如何，当我将品质作为run_obj参数时，我得到了错误消息

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' TypeError：输入类型不支持ufunc'isfinite'，并且根据强制转换规则“ safe”，不能将输入安全地强制转换为任何受支持的类型

If I understood it correctly, the above error message is obtained when an int is expected but str is given. 如果我正确理解，则当需要int但给出str时，会获得上述错误消息。 To check whether the problem was with configuration space, I tried 为了检查问题是否出在配置空间上，我尝试了

optimizer, loss_fn, model, total_loss = net(learning_rate= 0.02, Momentum= 0.05)

instead of these: 代替这些：

optimizer, loss_fn, model, total_loss = net(**cfg)
optimizer, loss_fn, model, total_loss = net(learning_rate= cfg["learning_rate"], Momentum= cfg["Momentum"])

the error remains the same. 错误保持不变。

Any ideas on how to use smac to optimize hyperparameters of CNN and why do I get this error message? 关于如何使用smac优化CNN的超参数的任何想法，为什么我会收到此错误消息？ I tried looking for similar problems online. 我试图在网上寻找类似的问题。 This post was a little helpful. 这篇文章有点帮助。 Unfortunately, since there is no implementation of smac on NN (at least I did not find it), I cannot figure out the solution. 不幸的是，由于在NN上没有smac的实现（至少我没有找到它），所以我找不到解决方案。 I ran out of all ideas. 我没办法了。

Any help, ideas or useful link is appreciated. 任何帮助，想法或有用的链接表示赞赏。

Thank you! 谢谢！

Answer 1

I believe the tae_runner ( kmnist_from_cfg in your case) has to be a callable that takes a configuration space point, which you correctly provide, and outputs a single number. 我相信tae_runner （ kmnist_from_cfg你的情况）必须是一个可调用，需要一个配置空间点，你提供正确，并输出一个数字。 You output a tuple of things. 您输出一个元组的东西。 Perhaps only return the total_loss on the validation set? 也许只返回验证集上的total_loss ？ I am basing this on the svm example in the smac github at https://github.com/automl/SMAC3/blob/master/examples/svm.py . 我将其基于smac github中svm示例，网址为https://github.com/automl/SMAC3/blob/master/examples/svm.py 。

如何使用smac进行卷积神经网络的超参数优化？

问题描述

1 个解决方案

解决方案1
0 2019-05-23 11:47:16

如何使用smac进行卷积神经网络的超参数优化？

问题描述

1 个解决方案

解决方案1 0 2019-05-23 11:47:16

解决方案1
0 2019-05-23 11:47:16