简体   繁体   中英

What can I do to help make my TensorFlow network overfit a large dataset?

The reason I am trying to overfit specifically, is because I am following the "Deep Learning with Python" by François Chollet's steps to designing a.network. This is important as this is for my final project in my degree.

At this stage, I need to make a.network large enough to overfit my data in order to determine a maximal capacity, an upper-bounds for the size of.networks that I will optimise for.

However, as the title suggests, I am struggling to make my.network overfit. Perhaps my approach is naïve, but let me explain my model:

I am using this dataset , to train a model to classify stars. There are two classes that a star must be classified by (into both of them): its spectral class (100 classes) and luminosity class (10 classes). For example, our sun is a 'G2V', it's spectral class is 'G2' and it's luminosity class is 'V'.

To this end, I have built a double-headed.network, it takes this input data: DataFrame containing input data

It then splits into two parallel.networks.

# Create our input layer:
input = keras.Input(shape=(3), name='observation_data')

# Build our spectral class
s_class_branch = layers.Dense(100000, activation='relu', name = 's_class_branch_dense_1')(input)
s_class_branch = layers.Dense(500, activation='relu', name = 's_class_branch_dense_2')(s_class_branch)

# Spectral class prediction
s_class_prediction = layers.Dense(100, 
                                  activation='softmax', 
                                  name='s_class_prediction')(s_class_branch)

# Build our luminosity class
l_class_branch = layers.Dense(100000, activation='relu', name = 'l_class_branch_dense_1')(input)
l_class_branch = layers.Dense(500, activation='relu', name = 'l_class_branch_dense_2')(l_class_branch)

# Luminosity class prediction
l_class_prediction = layers.Dense(10, 
                                  activation='softmax', 
                                  name='l_class_prediction')(l_class_branch)

# Now we instantiate our model using the layer setup above
scaled_model = Model(input, [s_class_prediction, l_class_prediction])

optimizer = keras.optimizers.RMSprop(learning_rate=0.004)

scaled_model.compile(optimizer=optimizer,
              loss={'s_class_prediction':'categorical_crossentropy',
                    'l_class_prediction':'categorical_crossentropy'},
              metrics=['accuracy'])

logdir = os.path.join("logs", "2raw100k")
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

scaled_model.fit(
    input_data,{
        's_class_prediction':spectral_targets,
        'l_class_prediction':luminosity_targets
        },
    epochs=20, 
    batch_size=1000,
    validation_split=0.0,
    callbacks=[tensorboard_callback])

In the code above you can see me attempting a model with two hidden layers in both branches, one layer with a shape of 100 000, following into another layer with 500, before going to the output layer. The training targets are one-hot encoded, so there is one node for every class.

I have tried a wide range of sizes with one to four hidden layers, ranging from a shape of 500 to 100 000, only stopping because I ran out of RAM. I have only used dense layers, with the exception of trying a normalisation layer to no affect.

Graph of losses

They will all happily train and slowly lower the loss, but they never seem to overfit. I have run.networks out to 100 epochs and they still will not overfit.

What can I do to make my.network fit the data better? I am fairly new to machine learning, having only been doing this for a year now, so I am sure there is something that I am missing. I really appreciate any help and would be happy to provide the logs shown in the graph.

Alright, so after a lot more training I think I have this answered. Basically, the.network did not have adequate capacity and needed more layers. I had tried more layers earlier but because I was not comparing it to validation data the overfitting was not apparent!

The proof is in the pudding.

So thank you to @Aryagm for their comment, because that let me work it out. As you can see, the validation data (grey and blue) clearly overfits, while the training data (green and orange) does not show it.

If anything, this goes to show why a separate validation set is so important and I am a fool for not having used it in the first place. Lesson learned.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM