简体   繁体   English

Keras (Tensorflow) 使用形状(特征、特征)创建错误的张量导致 OOM 内存错误

[英]Keras (Tensorflow) creates wrong tensor with shape (features, features) leading to OOM memory error

I hope this post finds you well.我希望这篇文章能找到你。 Here's my issue in summary:这是我的问题总结:

  1. I am trying to build a 2 hidden layer DNN with Keras.我正在尝试使用 Keras 构建一个 2 隐藏层 DNN。
  2. I have a data set of shape (5520,34716), ie 5520 samples with 34716 feature values each.我有一个形状为 (5520,34716) 的数据集,即 5520 个样本,每个样本具有 34716 个特征值。 I am trying to perform 8-class classification.我正在尝试执行 8 级分类。
  3. My server has 4 Tesla P100-PCIE-16GB Cores.我的服务器有 4 个 Tesla P100-PCIE-16GB 内核。

I have a data set of shape (5520,34716).我有一个形状数据集 (5520,34716)。 I wish to perform 5-fold cross validation.我希望执行 5 折交叉验证。

My code is as follows:我的代码如下:

def test_kcv(data, labels):
print("Performing Final K-Fold Cross Validation on entire dataset............")
final_lr = 0.00001
final_neurons = [500,100]
final_epochs = 50
final_bs = 32

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
cvscores = []
cm_stats = []
counter = 1
for train_indices, val_indices in kfold.split(data, decode_onehot(labels)):
 print("Beginning Model Number {}".format(counter))
 tf.keras.backend.clear_session()
 K.clear_session()
 with tf.device('/gpu:0'):
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten())
  # Input Layer
  model.add(tf.keras.layers.Dense(input_neurons, activation = 'relu', input_shape=(final_bs, 34716)))
  # Hidden Layers
  model.add(tf.keras.layers.Dense(final_neurons[0], activation = 'relu'))
  model.add(tf.keras.layers.Dense(final_neurons[1], activation = 'relu'))
  # Output Layers
  model.add(tf.keras.layers.Dense(8, activation = 'softmax'))
  print(data[train_indices].shape, labels[train_indices].shape) # (4416, 34716) (4416, 8)

 model.compile(optimizer = tf.keras.optimizers.Adam(lr=final_lr) , loss="categorical_crossentropy", metrics=['accuracy'])
 model.fit(x=data[train_indices], y=labels[train_indices], epochs=final_epochs, batch_size=final_bs, shuffle=True)
 model.summary()
 val_loss, val_acc = model.evaluate(data[val_indices], labels[val_indices])
 cvscores.append(val_acc)
 print("For model {}, The loss is: {}, the accuracy is: {}.".format(counter, round(val_loss, 5), round(val_acc, 5)))

However, this is the error i got:但是,这是我得到的错误:

    2019-04-11 16:57:46.899057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    2019-04-11 16:57:46.899174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-04-11 16:57:46.899187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
    2019-04-11 16:57:46.899195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
    2019-04-11 16:57:46.899943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15156 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:85:00.0, compute capability: 6.0)
    2019-04-11 16:58:51.016552: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.49GiB.  Current allocation summary follows.
    2019-04-11 16:58:51.016668: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256):   Total Chunks: 52, Chunks in use: 52. 13.0KiB allocated for chunks. 13.0KiB in use in bin. 572B client-requested in use in bin.
    2019-04-11 16:58:51.016717: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016740: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024):  Total Chunks: 10, Chunks in use: 10. 13.2KiB allocated for chunks. 13.2KiB in use in bin. 11.4KiB client-requested in use in bin.
    2019-04-11 16:58:51.016758: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016778: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096):  Total Chunks: 4, Chunks in use: 4. 25.0KiB allocated for chunks. 25.0KiB in use in bin. 25.0KiB client-requested in use in bin.
    2019-04-11 16:58:51.016795: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016812: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016839: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016855: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016877: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072):        Total Chunks: 3, Chunks in use: 3. 407.2KiB allocated for chunks. 407.2KiB in use in bin. 406.8KiB client-requested in use in bin.
    2019-04-11 16:58:51.016899: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144):        Total Chunks: 4, Chunks in use: 4. 1.22MiB allocated for chunks. 1.22MiB in use in bin. 1.22MiB client-requested in use in bin.
    2019-04-11 16:58:51.016917: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016933: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016952: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.016970: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304):       Total Chunks: 1, Chunks in use: 1. 4.24MiB allocated for chunks. 4.24MiB in use in bin. 4.24MiB client-requested in use in bin.
    2019-04-11 16:58:51.016988: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017004: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017025: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432):      Total Chunks: 4, Chunks in use: 4. 211.89MiB allocated for chunks. 211.89MiB in use in bin. 211.89MiB client-requested in use in bin.
    2019-04-11 16:58:51.017042: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017061: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
    2019-04-11 16:58:51.017080: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456):     Total Chunks: 4, Chunks in use: 3. 14.59GiB allocated for chunks. 13.47GiB in use in bin. 13.47GiB client-requested in use in bin.
    2019-04-11 16:58:51.017102: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 4.49GiB was 256.00MiB, Chunk State: 
    2019-04-11 16:58:51.017125: I tensorflow/core/common_runtime/bfc_allocator.cc:619]   Size: 1.12GiB | Requested Size: 0B | in_use: 0, prev:   Size: 256B | Requested Size: 4B | in_use: 1
    2019-04-11 16:58:51.017144: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000000 of size 1280
    2019-04-11 16:58:51.017160: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000500 of size 256
    2019-04-11 16:58:51.017175: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000600 of size 256
    2019-04-11 16:58:51.017188: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000700 of size 256
    2019-04-11 16:58:51.017201: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000800 of size 256
    2019-04-11 16:58:51.017214: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000900 of size 256
    2019-04-11 16:58:51.017226: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000a00 of size 256
    2019-04-11 16:58:51.017239: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000b00 of size 256
    2019-04-11 16:58:51.017253: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8000c00 of size 1792
    2019-04-11 16:58:51.017265: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001300 of size 256
    2019-04-11 16:58:51.017278: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001400 of size 256
    2019-04-11 16:58:51.017291: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001500 of size 1024
    2019-04-11 16:58:51.017303: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001900 of size 256
    2019-04-11 16:58:51.017316: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001a00 of size 256
    2019-04-11 16:58:51.017328: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001b00 of size 256
    2019-04-11 16:58:51.017340: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001c00 of size 256
    2019-04-11 16:58:51.017352: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001d00 of size 256
    2019-04-11 16:58:51.017366: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6be8001e00 of size 55545600
    2019-04-11 16:58:51.017378: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb4fad00 of size 256
    2019-04-11 16:58:51.017391: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb4fae00 of size 1792
    2019-04-11 16:58:51.017404: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb4fb500 of size 320000
    2019-04-11 16:58:51.017417: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb549700 of size 1024
    2019-04-11 16:58:51.017430: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb549b00 of size 6400
    2019-04-11 16:58:51.017442: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb54b400 of size 256
    2019-04-11 16:58:51.017455: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6beb54b500 of size 4820802816
    2019-04-11 16:58:51.017468: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6d0aac5200 of size 139008
    2019-04-11 16:58:51.017481: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6d0aae7100 of size 4820802816
    2019-04-11 16:58:51.017494: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2a060e00 of size 55545600
    2019-04-11 16:58:51.017506: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d559d00 of size 320000
    2019-04-11 16:58:51.017519: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a7f00 of size 6400
    2019-04-11 16:58:51.017532: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9800 of size 256
    2019-04-11 16:58:51.017544: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9900 of size 256
    2019-04-11 16:58:51.017556: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9a00 of size 256
    2019-04-11 16:58:51.017569: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9b00 of size 256
    2019-04-11 16:58:51.017581: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9c00 of size 256
    2019-04-11 16:58:51.017594: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9d00 of size 256
    2019-04-11 16:58:51.017606: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9e00 of size 256
    2019-04-11 16:58:51.017618: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5a9f00 of size 256
    2019-04-11 16:58:51.017630: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa000 of size 256
    2019-04-11 16:58:51.017643: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa100 of size 256
    2019-04-11 16:58:51.017655: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa200 of size 256
    2019-04-11 16:58:51.017668: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa300 of size 256
    2019-04-11 16:58:51.017680: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa400 of size 256
    2019-04-11 16:58:51.017693: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa500 of size 256
    2019-04-11 16:58:51.017705: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa600 of size 256
    2019-04-11 16:58:51.017719: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aa700 of size 1024
    2019-04-11 16:58:51.017732: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d5aab00 of size 4443648
    2019-04-11 16:58:51.017745: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7900 of size 256
    2019-04-11 16:58:51.017757: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7a00 of size 256
    2019-04-11 16:58:51.017770: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7b00 of size 256
    2019-04-11 16:58:51.017783: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7c00 of size 256
    2019-04-11 16:58:51.017796: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2d9e7d00 of size 139008
    2019-04-11 16:58:51.017808: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2da09c00 of size 256
    2019-04-11 16:58:51.017821: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e2da09d00 of size 55545600
    2019-04-11 16:58:51.017834: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f02c00 of size 256
    2019-04-11 16:58:51.017849: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f02d00 of size 1792
    2019-04-11 16:58:51.017863: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f03400 of size 256
    2019-04-11 16:58:51.017877: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f03500 of size 320000
    2019-04-11 16:58:51.017890: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51700 of size 256
    2019-04-11 16:58:51.017904: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51800 of size 1024
    2019-04-11 16:58:51.017918: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51c00 of size 256
    2019-04-11 16:58:51.017932: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f51d00 of size 6400
    2019-04-11 16:58:51.017945: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53600 of size 256
    2019-04-11 16:58:51.017959: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53700 of size 256
    2019-04-11 16:58:51.017972: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53800 of size 256
    2019-04-11 16:58:51.017986: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6e30f53900 of size 4820802816
    2019-04-11 16:58:51.018000: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504cd600 of size 256
    2019-04-11 16:58:51.018013: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504cd700 of size 139008
    2019-04-11 16:58:51.018028: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504ef600 of size 256
    2019-04-11 16:58:51.018042: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f504ef700 of size 55545600
    2019-04-11 16:58:51.018055: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8600 of size 256
    2019-04-11 16:58:51.018068: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8700 of size 1792
    2019-04-11 16:58:51.018082: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8e00 of size 256
    2019-04-11 16:58:51.018096: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f539e8f00 of size 320000
    2019-04-11 16:58:51.018110: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37100 of size 256
    2019-04-11 16:58:51.018123: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37200 of size 1024
    2019-04-11 16:58:51.018137: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37600 of size 256
    2019-04-11 16:58:51.018151: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a37700 of size 6400
    2019-04-11 16:58:51.018164: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a39000 of size 256
    2019-04-11 16:58:51.018178: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a39100 of size 256
    2019-04-11 16:58:51.018191: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f6f53a39200 of size 256
    2019-04-11 16:58:51.018205: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7f6f53a39300 of size 1201561600
    2019-04-11 16:58:51.018218: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
    2019-04-11 16:58:51.018236: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 52 Chunks of size 256 totalling 13.0KiB
    2019-04-11 16:58:51.018253: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 5 Chunks of size 1024 totalling 5.0KiB
    2019-04-11 16:58:51.018270: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB
    2019-04-11 16:58:51.021083: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 1792 totalling 7.0KiB
    2019-04-11 16:58:51.021110: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 6400 totalling 25.0KiB
    2019-04-11 16:58:51.021127: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 139008 totalling 407.2KiB
    2019-04-11 16:58:51.021143: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 320000 totalling 1.22MiB
    2019-04-11 16:58:51.021159: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 4443648 totalling 4.24MiB
    2019-04-11 16:58:51.021174: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 55545600 totalling 211.89MiB
    2019-04-11 16:58:51.021190: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 4820802816 totalling 13.47GiB
    2019-04-11 16:58:51.021206: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 13.68GiB
    2019-04-11 16:58:51.021225: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
    Limit:                 15892345652
    InUse:                 14690784000
    MaxInUse:              14690784000
    NumAllocs:                      81
    MaxAllocSize:           4820802816

    2019-04-11 16:58:51.021260: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *********************************************************************************************_______
    2019-04-11 16:58:51.021316: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[34716,34716] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    Traceback (most recent call last):
      File "classify_tasks.py", line 401, in <module>
        test_kcv(total_data, total_label)
      File "classify_tasks.py", line 358, in test_kcv
        model.fit(x=data[train_indices], y=labels[train_indices], epochs=final_epochs, batch_size=final_bs, shuffle=True)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training.py", line 1639, in fit
        validation_steps=validation_steps)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 215, in fit_loop
        outs = f(ins_batch)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/backend.py", line 2986, in __call__
        run_metadata=self.run_metadata)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__
        run_metadata_ptr)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
        c_api.TF_GetCode(self.status.status))
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[34716,34716] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
            [[{{node training/Adam/mul_3}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/ReadVariableOp_4, training/Adam/mul_3/ReadVariableOp)]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

            [[{{node loss/output_1_loss/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch/_103}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_281_l...ert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

As you can see, for some reason tensorflow is trying to create a tensor of shape (34716, 34716).如您所见,由于某种原因,tensorflow 试图创建一个形状为 (34716, 34716) 的张量。 I am not sure why, and i tried printing out the shapes of the input data, coming out correctly to (4416, 34716) (4416, 8) for the train and test dataset.我不知道为什么,我尝试打印出输入数据的形状,对于训练和测试数据集,正确输出为 (4416, 34716) (4416, 8)。 Please do let me know if you need any extra data on my part.如果您需要我的任何额外数据,请告诉我。 Thanks alot!!!!非常感谢!!!!

The input shape you specify in the first dense layer is not correct in my opinion.您在第一个密集层中指定的输入形状在我看来是不正确的。 If your data is of shape (features, samples) you should swap it to be of format (samples, features) and specify (None, features) as input shape to the first layer.如果您的数据是形状(特征、样本),您应该将其交换为格式(样本、特征)并指定(无、特征)作为第一层的输入形状。 In your example that would be (None, 4416).在您的示例中,这将是 (None, 4416)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM