Keras + TensorFlow 实时训练图

Question

I have the following code running inside a Jupyter notebook:我在 Jupyter 笔记本中运行以下代码：

# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, verbose=0)
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

The code collects epochs history, then displays the progress history.该代码收集 epochs 历史记录，然后显示进度历史记录。

Q: How can I make the chart change while training so I can see the changes in real time?问：如何在训练时更改图表，以便实时看到变化？

Answer 1

There is livelossplot Python package for live training loss plots in Jupyter Notebook for Keras (disclaimer: I am the author).在 Jupyter Notebook for Keras 中有livelossplot Python 包用于实时训练损失图（免责声明：我是作者）。

from livelossplot import PlotLossesKeras

model.fit(X_train, Y_train,
          epochs=10,
          validation_data=(X_test, Y_test),
          callbacks=[PlotLossesKeras()],
          verbose=0)

To see how does it work, look at its source, especially this file: https://github.com/stared/livelossplot/blob/master/livelossplot/outputs/matplotlib_plot.py ( from IPython.display import clear_output and clear_output(wait=True) ).要了解它是如何工作的，请查看它的源代码，尤其是这个文件： https ://github.com/stared/livelossplot/blob/master/livelossplot/outputs/matplotlib_plot.py（ from IPython.display import clear_output和clear_output(wait=True) ）。

A fair disclaimer: it does interfere with Keras output .一个公平的免责声明：它确实会干扰 Keras 输出。

Answer 2

Keras comes with a callback for TensorBoard . Keras 带有一个对TensorBoard的回调。

You can easily add this behaviour to your model and then just run tensorboard on top of the logging data.您可以轻松地将此行为添加到您的模型中，然后只需在日志数据之上运行 tensorboard。

callbacks = [TensorBoard(log_dir='./logs')]
result = model.fit(X, Y, ..., callbacks=callbacks)

And then on your shell:然后在你的外壳上：

tensorboard --logdir=/logs

If you need it in your notebook, you can also write your own callback to get metrics while training:如果您在笔记本中需要它，您还可以编写自己的回调以在训练时获取指标：

 class LogCallback(Callback):

    def on_epoch_end(self, epoch, logs=None):
        print(logs["train_accuracy"])

This would get the training accuracy at the end of the current epoch and print it.这将在当前 epoch 结束时获得训练准确性并将其打印出来。 There's some good documentation around it on the official keras site. 在 keras 官方网站上有一些很好的文档。

Answer 3

this gives you an idea of the simplest codes.这让您了解最简单的代码。

[ Sample ]: [ 样本 ]：

# https://stackoverflow.com/questions/71748896/how-to-plot-a-graph-of-training-time-and-batch-size-of-neural-network

import os
from os.path import exists

import matplotlib.pyplot as plt
import tensorflow as tf

import time
import h5py

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
None
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
print(physical_devices)
print(config)

os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'
print(os.getenv('TF_GPU_ALLOCATOR'))

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
epoch_1_time = [ ]
epoch_5_time = [ ]
epoch_10_time = [ ]
epoch_50_time = [ ]
epoch_100_time = [ ]

database_buffer = "F:\\models\\buffer\\" + os.path.basename(__file__).split('.')[0] + "\\TF_DataSets_01.h5"
database_buffer_dir = os.path.dirname(database_buffer)

if not exists(database_buffer_dir) : 
    os.mkdir(database_buffer_dir)
    print("Create directory: " + database_buffer_dir)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Functions
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
# ...

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
DataSet
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
# Create hdf5 file
hdf5_file = h5py.File(database_buffer, mode='w')

# Train images
hdf5_file['x_train'] = train_images
hdf5_file['y_train'] = train_labels

# Test images
hdf5_file['x_test'] = test_images
hdf5_file['y_test'] = test_labels

hdf5_file.close()

# Visualize dataset train sample
hdf5_file = h5py.File(database_buffer,  mode='r')

# Load features
# x_train = hdf5_file['x_train'][0: 50000]
# x_test = hdf5_file['x_test'][0: 10000]
# y_train = hdf5_file['y_train'][0: 50000]
# y_test = hdf5_file['y_test'][0: 10000]

x_train = hdf5_file['x_train'][0: 100]
x_test = hdf5_file['x_test'][0: 100]
y_train = hdf5_file['y_train'][0: 100]
y_test = hdf5_file['y_test'][0: 100]

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=( 32, 32, 3 )),
        tf.keras.layers.Normalization(mean=3., variance=2.),
        tf.keras.layers.Normalization(mean=4., variance=6.),
        tf.keras.layers.Conv2DTranspose(2, 3, activation='relu', padding="same"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(4 * 256), 
        tf.keras.layers.Reshape((4 * 256, 1)),

        tf.keras.layers.LSTM(128, return_sequences=True, return_state=False),
        tf.keras.layers.LSTM(128, name='LSTM256'),
        tf.keras.layers.Dropout(0.2),
])

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64, activation='relu', name='dense64'))
model.add(tf.keras.layers.Dense(7))
model.summary()

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Callback
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class custom_callback_5(tf.keras.callbacks.Callback):
    global epoch_5
    val_dir = os.path.join(log_dir, 'validation')
    print('val_dir: ' + val_dir)
    
    epoch_5 = 0
    
    def on_epoch_end( self, epoch, logs={} ):
        global epoch_5

        time_counter = time.perf_counter()
        epoch_1_time.append( epoch )
        
        if epoch == 1 :
            ###
            epoch_5 = time_counter

        if epoch % 5 == 0 :
            epoch_5 = time_counter
            
        epoch_5_time.append( epoch_5 )
        ### updates ###
        with file_writer.as_default():
            tf.summary.scalar("epoch_5", epoch_5, step=epoch)
            file_writer.flush()

custom_callback_5 = custom_callback_5()

class custom_callback_10(tf.keras.callbacks.Callback):
    global epoch_10
    
    epoch_10 = 0
    
    def on_epoch_end( self, epoch, logs={} ):
        global epoch_10

        time_counter = time.perf_counter()
        #epoch_1_time.append( epoch )
        
        if epoch == 1 :
            ###
            epoch_10 = time_counter

        if epoch % 10 == 0 :
            epoch_10 = time_counter
            
        epoch_10_time.append( epoch_10 )
        ### updates ###
        with file_writer.as_default():
            tf.summary.scalar("epoch_10", epoch_10, step=epoch)
            file_writer.flush()

custom_callback_10 = custom_callback_10()

class custom_callback_50(tf.keras.callbacks.Callback):
    global epoch_50
    
    epoch_50 = 0
    
    def on_epoch_end( self, epoch, logs={} ):
        global epoch_50

        time_counter = time.perf_counter()
        #epoch_1_time.append( epoch )
        
        if epoch == 1 :
            ###
            epoch_50 = time_counter

        if epoch % 50 == 0 :
            epoch_50 = time_counter
            
        epoch_50_time.append( epoch_50 )
        ### updates ###
        with file_writer.as_default():
            tf.summary.scalar("epoch_50", epoch_50, step=epoch)
            file_writer.flush()

custom_callback_50 = custom_callback_50()

class custom_callback_100(tf.keras.callbacks.Callback):
    global epoch_100
    
    epoch_100 = 0
    
    def on_epoch_end( self, epoch, logs={} ):
        global epoch_100

        time_counter = time.perf_counter()
        #epoch_1_time.append( epoch )
        
        if epoch == 1 :
            ###
            epoch_100 = time_counter

        if epoch % 100 == 0 :
            epoch_100 = time_counter
            
        epoch_100_time.append( epoch_100 )
        ### updates ###
        with file_writer.as_default():
             tf.summary.scalar("epoch_100", epoch_100, step=epoch)
             file_writer.flush()

custom_callback_100 = custom_callback_100()

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Optimizer
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
optimizer = tf.keras.optimizers.Nadam( learning_rate=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, name='Nadam' )

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Loss Fn
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""                               
lossfn = tf.keras.losses.MeanSquaredLogarithmicError(reduction=tf.keras.losses.Reduction.AUTO, name='mean_squared_logarithmic_error')

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Summary
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model.compile(optimizer=optimizer, loss=lossfn)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = model.fit(x_train, y_train, epochs=1000, batch_size=5 ,validation_data=(x_test, y_test), callbacks=[custom_callback_5])
history = model.fit(x_train, y_train, epochs=1000, batch_size=10 ,validation_data=(x_test, y_test), callbacks=[custom_callback_10])
history = model.fit(x_train, y_train, epochs=1000, batch_size=50 ,validation_data=(x_test, y_test), callbacks=[custom_callback_50])
history = model.fit(x_train, y_train, epochs=1000, batch_size=100 ,validation_data=(x_test, y_test), callbacks=[custom_callback_100])

plt.plot(epoch_1_time, epoch_5_time)
plt.plot(epoch_1_time, epoch_10_time)
plt.plot(epoch_1_time, epoch_50_time)
plt.plot(epoch_1_time, epoch_100_time)
plt.legend(["epoch_5_time", "epoch_10_time", "epoch_50_time", "epoch_100_time"])
plt.show()
plt.close()

input('...')
## tensorboard --inspect --logdir="F:\\models\\checkpoint\\test_tf_plot_graph\\"
## tensorboard --logdir="F:\\models\\checkpoint\\test_tf_plot_graph\\"

[ Output ]: [ 输出 ]：

Event statistics for F:\\models\\checkpoint\\test_tf_plot_graph\validation:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
   first_step           20
   last_step            6
   max_step             140
   min_step             0
   num_steps            14
   outoforder_steps     [(20, 0), (40, 1), (60, 2), (80, 3), (100, 4), (120, 5), (140, 6)]
======================================================================

... ...

Keras + TensorFlow 实时训练图

问题描述

3 个解决方案

解决方案1
25 已采纳 2018-04-13 09:47:13

解决方案2
10 2017-06-25 15:24:49

解决方案3
0 2022-04-05 12:00:18

Keras + TensorFlow 实时训练图

问题描述

3 个解决方案

解决方案1 25 已采纳 2018-04-13 09:47:13

解决方案2 10 2017-06-25 15:24:49

解决方案3 0 2022-04-05 12:00:18

解决方案1
25 已采纳 2018-04-13 09:47:13

解决方案2
10 2017-06-25 15:24:49

解决方案3
0 2022-04-05 12:00:18