简体   繁体   中英

Keras (Tensorflow) creates wrong tensor with shape (features, features) leading to OOM memory error

I hope this post finds you well. Here's my issue in summary:

  1. I am trying to build a 2 hidden layer DNN with Keras.
  2. I have a data set of shape (5520,34716), ie 5520 samples with 34716 feature values each. I am trying to perform 8-class classification.
  3. My server has 4 Tesla P100-PCIE-16GB Cores.

I have a data set of shape (5520,34716). I wish to perform 5-fold cross validation.

My code is as follows:

def test_kcv(data, labels):
print("Performing Final K-Fold Cross Validation on entire dataset............")
final_lr = 0.00001
final_neurons = [500,100]
final_epochs = 50
final_bs = 32

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
cvscores = []
cm_stats = []
counter = 1
for train_indices, val_indices in kfold.split(data, decode_onehot(labels)):
 print("Beginning Model Number {}".format(counter))
 tf.keras.backend.clear_session()
 K.clear_session()
 with tf.device('/gpu:0'):
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten())
  # Input Layer
  model.add(tf.keras.layers.Dense(input_neurons, activation = 'relu', input_shape=(final_bs, 34716)))
  # Hidden Layers
  model.add(tf.keras.layers.Dense(final_neurons[0], activation = 'relu'))
  model.add(tf.keras.layers.Dense(final_neurons[1], activation = 'relu'))
  # Output Layers
  model.add(tf.keras.layers.Dense(8, activation = 'softmax'))
  print(data[train_indices].shape, labels[train_indices].shape) # (4416, 34716) (4416, 8)

 model.compile(optimizer = tf.keras.optimizers.Adam(lr=final_lr) , loss="categorical_crossentropy", metrics=['accuracy'])
 model.fit(x=data[train_indices], y=labels[train_indices], epochs=final_epochs, batch_size=final_bs, shuffle=True)
 model.summary()
 val_loss, val_acc = model.evaluate(data[val_indices], labels[val_indices])
 cvscores.append(val_acc)
 print("For model {}, The loss is: {}, the accuracy is: {}.".format(counter, round(val_loss, 5), round(val_acc, 5)))

However, this is the error i got:

    2019-04-11 16:57:46.899057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    2019-04-11 16:57:46.899174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-04-11 16:57:46.899187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
    2019-04-11 16:57:46.899195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
    2019-04-11 16:57:46.899943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:85:00.0, compute capability: 6.0)
    2019-04-11 16:58:51.016552: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.49GiB.  Current allocation summary follows.
    2019-04-11 16:58:51.016668: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256):   Total Chunks: 52, Chunks in use: 52. 13.0KiB allocated for chunks. 13.0KiB in use in bin. 572B client-requested in use in bin.
    2019-04-11 16:58:51.016717: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016740: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024):  Total Chunks: 10, Chunks in use: 10. 13.2KiB allocated for chunks. 13.2KiB in use in bin. 11.4KiB client-requested in use in bin.
    2019-04-11 16:58:51.016758: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016778: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096):  Total Chunks: 4, Chunks in use: 4. 25.0KiB allocated for chunks. 25.0KiB in use in bin. 25.0KiB client-requested in use in bin.
    2019-04-11 16:58:51.016795: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016812: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016839: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016855: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016877: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072):        Total Chunks: 3, Chunks in use: 3. 407.2KiB allocated for chunks. 407.2KiB in use in bin. 406.8KiB client-requested in use in bin.
    2019-04-11 16:58:51.016899: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144):        Total Chunks: 4, Chunks in use: 4. 1.22MiB allocated for chunks. 1.22MiB in use in bin. 1.22MiB client-requested in use in bin.
    2019-04-11 16:58:51.016917: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016933: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016952: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016970: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304):       Total Chunks: 1, Chunks in use: 1. 4.24MiB allocated for chunks. 4.24MiB in use in bin. 4.24MiB client-requested in use in bin.
    2019-04-11 16:58:51.016988: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017004: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017025: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432):      Total Chunks: 4, Chunks in use: 4. 211.89MiB allocated for chunks. 211.89MiB in use in bin. 211.89MiB client-requested in use in bin.
    2019-04-11 16:58:51.017042: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017061: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017080: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456):     Total Chunks: 4, Chunks in use: 3. 14.59GiB allocated for chunks. 13.47GiB in use in bin. 13.47GiB client-requested in use in bin.
    2019-04-11 16:58:51.017102: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 4.49GiB was 256.00MiB, Chunk State: 
    2019-04-11 16:58:51.017125: I tensorflow/core/common_runtime/bfc_allocator.cc:619]   Size: 1.12GiB | Requested Size: 0B | in_use: 0, prev:   Size: 256B | Requested Size: 4B | in_use: 1
    2019-04-11 16:58:51.017144: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000000 of size 1280
    2019-04-11 16:58:51.017160: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000500 of size 256
    2019-04-11 16:58:51.017175: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000600 of size 256
    2019-04-11 16:58:51.017188: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000700 of size 256
    2019-04-11 16:58:51.017201: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000800 of size 256
    2019-04-11 16:58:51.017214: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000900 of size 256
    2019-04-11 16:58:51.017226: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000a00 of size 256
    2019-04-11 16:58:51.017239: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000b00 of size 256
    2019-04-11 16:58:51.017253: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000c00 of size 1792
    2019-04-11 16:58:51.017265: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001300 of size 256
    2019-04-11 16:58:51.017278: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001400 of size 256
    2019-04-11 16:58:51.017291: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001500 of size 1024
    2019-04-11 16:58:51.017303: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001900 of size 256
    2019-04-11 16:58:51.017316: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001a00 of size 256
    2019-04-11 16:58:51.017328: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001b00 of size 256
    2019-04-11 16:58:51.017340: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001c00 of size 256
    2019-04-11 16:58:51.017352: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001d00 of size 256
    2019-04-11 16:58:51.017366: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001e00 of size 55545600
    2019-04-11 16:58:51.017378: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb4fad00 of size 256
    2019-04-11 16:58:51.017391: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb4fae00 of size 1792
    2019-04-11 16:58:51.017404: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb4fb500 of size 320000
    2019-04-11 16:58:51.017417: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb549700 of size 1024
    2019-04-11 16:58:51.017430: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb549b00 of size 6400
    2019-04-11 16:58:51.017442: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb54b400 of size 256
    2019-04-11 16:58:51.017455: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb54b500 of size 4820802816
    2019-04-11 16:58:51.017468: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6d0aac5200 of size 139008
    2019-04-11 16:58:51.017481: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6d0aae7100 of size 4820802816
    2019-04-11 16:58:51.017494: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2a060e00 of size 55545600
    2019-04-11 16:58:51.017506: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d559d00 of size 320000
    2019-04-11 16:58:51.017519: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a7f00 of size 6400
    2019-04-11 16:58:51.017532: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9800 of size 256
    2019-04-11 16:58:51.017544: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9900 of size 256
    2019-04-11 16:58:51.017556: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9a00 of size 256
    2019-04-11 16:58:51.017569: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9b00 of size 256
    2019-04-11 16:58:51.017581: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9c00 of size 256
    2019-04-11 16:58:51.017594: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9d00 of size 256
    2019-04-11 16:58:51.017606: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9e00 of size 256
    2019-04-11 16:58:51.017618: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9f00 of size 256
    2019-04-11 16:58:51.017630: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa000 of size 256
    2019-04-11 16:58:51.017643: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa100 of size 256
    2019-04-11 16:58:51.017655: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa200 of size 256
    2019-04-11 16:58:51.017668: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa300 of size 256
    2019-04-11 16:58:51.017680: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa400 of size 256
    2019-04-11 16:58:51.017693: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa500 of size 256
    2019-04-11 16:58:51.017705: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa600 of size 256
    2019-04-11 16:58:51.017719: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa700 of size 1024
    2019-04-11 16:58:51.017732: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aab00 of size 4443648
    2019-04-11 16:58:51.017745: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7900 of size 256
    2019-04-11 16:58:51.017757: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7a00 of size 256
    2019-04-11 16:58:51.017770: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7b00 of size 256
    2019-04-11 16:58:51.017783: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7c00 of size 256
    2019-04-11 16:58:51.017796: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7d00 of size 139008
    2019-04-11 16:58:51.017808: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2da09c00 of size 256
    2019-04-11 16:58:51.017821: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2da09d00 of size 55545600
    2019-04-11 16:58:51.017834: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f02c00 of size 256
    2019-04-11 16:58:51.017849: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f02d00 of size 1792
    2019-04-11 16:58:51.017863: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f03400 of size 256
    2019-04-11 16:58:51.017877: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f03500 of size 320000
    2019-04-11 16:58:51.017890: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51700 of size 256
    2019-04-11 16:58:51.017904: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51800 of size 1024
    2019-04-11 16:58:51.017918: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51c00 of size 256
    2019-04-11 16:58:51.017932: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51d00 of size 6400
    2019-04-11 16:58:51.017945: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53600 of size 256
    2019-04-11 16:58:51.017959: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53700 of size 256
    2019-04-11 16:58:51.017972: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53800 of size 256
    2019-04-11 16:58:51.017986: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53900 of size 4820802816
    2019-04-11 16:58:51.018000: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504cd600 of size 256
    2019-04-11 16:58:51.018013: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504cd700 of size 139008
    2019-04-11 16:58:51.018028: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504ef600 of size 256
    2019-04-11 16:58:51.018042: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504ef700 of size 55545600
    2019-04-11 16:58:51.018055: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8600 of size 256
    2019-04-11 16:58:51.018068: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8700 of size 1792
    2019-04-11 16:58:51.018082: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8e00 of size 256
    2019-04-11 16:58:51.018096: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8f00 of size 320000
    2019-04-11 16:58:51.018110: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37100 of size 256
    2019-04-11 16:58:51.018123: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37200 of size 1024
    2019-04-11 16:58:51.018137: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37600 of size 256
    2019-04-11 16:58:51.018151: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37700 of size 6400
    2019-04-11 16:58:51.018164: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a39000 of size 256
    2019-04-11 16:58:51.018178: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a39100 of size 256
    2019-04-11 16:58:51.018191: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a39200 of size 256
    2019-04-11 16:58:51.018205: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7f6f53a39300 of size 1201561600
    2019-04-11 16:58:51.018218: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
    2019-04-11 16:58:51.018236: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 52 Chunks of size 256 totalling 13.0KiB
    2019-04-11 16:58:51.018253: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 5 Chunks of size 1024 totalling 5.0KiB
    2019-04-11 16:58:51.018270: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB
    2019-04-11 16:58:51.021083: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 1792 totalling 7.0KiB
    2019-04-11 16:58:51.021110: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 6400 totalling 25.0KiB
    2019-04-11 16:58:51.021127: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 139008 totalling 407.2KiB
    2019-04-11 16:58:51.021143: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 320000 totalling 1.22MiB
    2019-04-11 16:58:51.021159: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 4443648 totalling 4.24MiB
    2019-04-11 16:58:51.021174: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 55545600 totalling 211.89MiB
    2019-04-11 16:58:51.021190: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 4820802816 totalling 13.47GiB
    2019-04-11 16:58:51.021206: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 13.68GiB
    2019-04-11 16:58:51.021225: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
    Limit:                 15892345652
    InUse:                 14690784000
    MaxInUse:              14690784000
    NumAllocs:                      81
    MaxAllocSize:           4820802816

    2019-04-11 16:58:51.021260: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *********************************************************************************************_______
    2019-04-11 16:58:51.021316: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[34716,34716] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    Traceback (most recent call last):
      File "classify_tasks.py", line 401, in <module>
        test_kcv(total_data, total_label)
      File "classify_tasks.py", line 358, in test_kcv
        model.fit(x=data[train_indices], y=labels[train_indices], epochs=final_epochs, batch_size=final_bs, shuffle=True)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training.py", line 1639, in fit
        validation_steps=validation_steps)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 215, in fit_loop
        outs = f(ins_batch)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/backend.py", line 2986, in __call__
        run_metadata=self.run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__
        run_metadata_ptr)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
        c_api.TF_GetCode(self.status.status))
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[34716,34716] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
            [[{{node training/Adam/mul_3}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/ReadVariableOp_4, training/Adam/mul_3/ReadVariableOp)]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

            [[{{node loss/output_1_loss/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch/_103}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_281_l...ert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

As you can see, for some reason tensorflow is trying to create a tensor of shape (34716, 34716). I am not sure why, and i tried printing out the shapes of the input data, coming out correctly to (4416, 34716) (4416, 8) for the train and test dataset. Please do let me know if you need any extra data on my part. Thanks alot!!!!

The input shape you specify in the first dense layer is not correct in my opinion. If your data is of shape (features, samples) you should swap it to be of format (samples, features) and specify (None, features) as input shape to the first layer. In your example that would be (None, 4416).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM