简体   繁体   中英

Keras optimizer state when dataset won't fit in memory

I have a neural net and I'm training it on a very large amount of data. The data won't fit into my computer's memory so I have to break it up and load it in chunks. So rather than use keras built in epoch counter like:

    model.fit(x=X,y=Y,epochs=20)

I'm writing explicit for-loops for training like:

    for i in range(iter): #iter is now my counter for epochs
            shuffle(datachunks) #pseudocode to shuffle the data around
            for j in range(datachunks): 
                    model.fit(x=X_chunk,y=Y_chunk,epochs=1)

My question involves learning rate decay. I know of two ways to implement learning rate decay in keras, one is to implement it in the optimizer like:

    keras.optimizers.Adam(lr=0.001,decay=1e-6)

Here supposedly the decay is at "each update" (which I'm guessing is each batch? This is a secondary question that I have...I haven't quite figured out what exactly the decay schedule is here). A second way I know to implement learning rate decay is through a learning rate scheduler in callbacks like so:

    keras.callbacks.LearningRateScheduler(schedule)

The iteration variable in this scheduler should be the epochs so the schedule function should take an epoch as an input and outputs a new learning rate. My question then, is will either of these learning rate decay mechanisms work for me? I have an explicit for loop and each time I call model.fit, it's only doing 1 epoch of training. If I use the callbacks method, will it just keep feeding "1" into the schedule and thereby never decay the learning rate? If I use the built in decay in the optimizers will the optimizer reset at each iteration and go back to the original learning rate or will it remember to keep decreasing the learning rate through all the loops? This same question applies to other hyperparamters such as the momentum (when applicable) which is a moving windows average of previous gradients. Does keras keep track of these moving windows averages across datachunks and epochs when I break down my data in this manner?

Ideally, you should use a generator with large amounts of data. Your generator will only ever have to handle a single batch of data at a time. It should be something like:

def myGenerator():
    while True:
        x,y=getSomeDataFromFile(batchSize)
        yield (x,y)

Then you can call fit_generator to train your model (don't forget to set steps_per_epoch to the number of batches it takes to complete an epoch).

Alternatively, if you want to train one epoch at a time, you need to keep increasing the number of epochs and specify the starting epoch like so:

model.fit(x, y, epochs=i, initial_epoch=i)

This way the learning rate will decay over time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM