简体   繁体   中英

h2o autoencoders high error (h2o.mse)

I am trying to use h2o to create an autoencoder using its deeplearning function. I am feeding a set of data about 4000x50 in size to the deeplearning function (hidden node c(200)) and then using h2o.mse to check its error and I am getting about 0.4, a fairly high value.

Is there anyway to reduce that error by changing something in the deeplearning function?

I assume everything is the defaults, except defining a single hidden layer with 200 nodes?

Your first set of things to try are:

  • Use more epochs (or use less aggressive early stopping criteria)
  • Use a 2nd hidden layer
  • Use more nodes in your hidden layer(s)
  • Get more training data

Note that all of those will increase your training time.

You can use H2OGridSearch to find the best autoencoder model with the smallest MSE. Below is an example in Python. Here you can find example in R.

def tuneAndTrain(hyperParameters, model, trainDataFrame):
    h2o.init()
    trainData=trainDataFrame.values        
    trainDataHex=h2o.H2OFrame(trainData)
    modelGrid = H2OGridSearch(model,hyper_params=hyperParameters)
    modelGrid.train(x=list(range(0,int(len(trainDataFrame.columns)))),training_frame=trainDataHex)
    gridperf1 = modelGrid.get_grid(sort_by='mse', decreasing=True)
    bestModel = gridperf1.models[0]
    return bestModel

And you can call the above function to find and train the best model:

hiddenOpt = [[50,50],[100,100], [5,5,5],[50,50,50]]
l2Opt = [1e-4,1e-2]
hyperParameters = {"hidden":hiddenOpt, "l2":l2Opt}
bestModel=tuneAndTrain(hyperParameters,H2OAutoEncoderEstimator(activation="Tanh", ignore_const_cols=False, epochs=200),dataFrameTrainPreprocessed)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM