简体   繁体   中英

How to limit GPU memory usage in TFLearn?

I'm using TFLearn with AlexNet to make a self driving car in GTA V, I've already trained the network but when I try to run GTA and the network simultaneously I get this error CUBLAS_STATUS_ALLOC_FAILED which means I've run out of GPU memory I guess.

This is my alex net file

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from tflearn.layers.normalization import local_response_normalization


def alexnet(width, height, lr):
    network = input_data(shape=[None, width, height, 1], name='input')
    network = conv_2d(network, 96, 11, strides=4, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 256, 5, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 384, 3, activation='relu')
    network = conv_2d(network, 256, 3, activation='relu')
    network = max_pool_2d(network, 3, strides=2)
    network = local_response_normalization(network)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, 4096, activation='tanh')
    network = dropout(network, 0.5)
    network = fully_connected(network, 3, activation='softmax')
    network = regression(network, optimizer='momentum',
                         loss='categorical_crossentropy',
                         learning_rate=lr, name='targets')

    model = tflearn.DNN(network, checkpoint_path='model_data/model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log')

    return model

I've tried adding this

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config)
session.run(tf.global_variables_initializer())

and then passing session=session to the tflearn.DNN function like this

 model = tflearn.DNN(network, checkpoint_path='model_data/model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log', session=session)

but it doesn't work either, I get that some variables are not initialized

In fact when I try to use the model like in this file for instance

import numpy as np
from alexnet import alexnet

WIDTH = 80
HEIGHT = 60
LR = 1e-3
EPOCHS = 8
MODEL_NAME = 'pygta5-car-{}-{}-{}-epochs.model'. \
    format(LR, 'alexnet', EPOCHS)

model = alexnet(WIDTH, HEIGHT, LR)

train_data = np.load('training_data.npy')

train = train_data[:-100]
test = train_data[-100:]

train_x = np.array([i[0] for i in train]).reshape([-1, WIDTH, HEIGHT, 1]) # Prendo solo le immagini
train_y = np.array([i[1] for i in train]) # Prendo solo le label

test_x = np.array([i[0] for i in test]).reshape([-1, WIDTH, HEIGHT, 1]) # Prendo solo le immagini
test_y = np.array([i[1] for i in test]) # Prendo solo le label

model.fit({'input': train_x}, {'targets': train_y},
          n_epoch=EPOCHS, validation_set=({'input': test_x}, {'targets': test_y}),
          snapshot_step=500, run_id=MODEL_NAME, show_metric=True)


model.save('models/model.tfl')

I get this error during the execution of model.fit()

"C:\Program Files\Python36\python.exe" C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py
WARNING:tensorflow:From C:\Program Files\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
2018-01-09 23:49:30.486827: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-01-09 23:49:30.947896: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:23:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2018-01-09 23:49:30.948297: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:23:00.0, compute capability: 6.1)
2018-01-09 23:49:32.382017: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:23:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-0.001-alexnet-8-epochs.model
Log directory: log/
---------------------------------
Training samples: 7775
Validation samples: 100
--
2018-01-09 23:49:34.924216: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.924720: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.925239: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.925749: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.926254: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.927268: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.927814: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.928404: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.928867: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.929380: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.929866: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.930321: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.930808: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.931303: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.931798: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
2018-01-09 23:49:34.932288: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Failed precondition: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
    return fn(*args)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
    status, run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
     [[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py", line 26, in <module>
    snapshot_step=500, run_id=MODEL_NAME, show_metric=True)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\models\dnn.py", line 216, in fit
    callbacks=callbacks)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 339, in fit
    show_metric)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 818, in _train
    feed_batch)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
    options, run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
     [[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Crossentropy/Mean/moving_avg/read', defined at:
  File "C:/Users/Elia/PycharmProjects/SelfDrivingGrandTheftAutoV/v2/train_model.py", line 11, in <module>
    model = alexnet(WIDTH, HEIGHT, LR)
  File "C:\Users\Elia\PycharmProjects\SelfDrivingGrandTheftAutoV\v2\alexnet.py", line 37, in alexnet
    max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log', session=session)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\models\dnn.py", line 65, in __init__
    best_val_accuracy=best_val_accuracy)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 131, in __init__
    clip_gradients)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 693, in initialize_training_ops
    ema_num_updates=self.training_steps)
  File "C:\Program Files\Python36\lib\site-packages\tflearn\summaries.py", line 239, in add_loss_summaries
    loss_averages_op = loss_averages.apply([loss] + other_losses)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\moving_averages.py", line 401, in apply
    colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 151, in create_slot_with_initializer
    dtype)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\training\slot_creator.py", line 67, in _create_slot_var
    validate_shape=validate_shape)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1203, in get_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1092, in get_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 425, in get_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 394, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 805, in _get_single_variable
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 213, in __init__
    constraint=constraint)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 356, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 125, in identity
    return gen_array_ops.identity(input, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 2070, in identity
    "Identity", input=input, name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
    op_def=op_def)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Crossentropy/Mean/moving_avg
     [[Node: Crossentropy/Mean/moving_avg/read = Identity[T=DT_FLOAT, _class=["loc:@Crossentropy/Mean/moving_avg"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Crossentropy/Mean/moving_avg)]]
     [[Node: Conv2D_1/W/read/_179 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_748_Conv2D_1/W/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


Process finished with exit code 1

Is there a way to fix this problem or a better way to limit gpu usage in tflearn?

I found this question when i was having the same problem. I don't think it will be relevant to you but for others it might be.

This problem occurs when you're trying to load the model into the video ram and it fails because there isn't enough for both GTA 5 and your model.

I'm new to tflearn to, so I can't explain why your solution isn't working.

To limit the gpu memory usage you can add the following line before model = tflearn.DNN(...) in your alexnet.

tflearn.init_graph(num_cores=4, gpu_memory_fraction=0.5)

TFLearn Documentation

Don't think num_cores=4 is actually necessary but I didn't test it without it.

Also you need to monitor your vram usage without alexnet running to see how much your game needs by itself because the above line will only work if it is less than 50% (you can change the value).

I'm trying something similar to you in Forza Horizon 3 (poorly optimized for pc) and by turning down the settings it was possible to reduce the usage from 60% to 40%.

I've got it working with a 8gb 2080 so it should work with your 6gb 1060.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM