簡體   English   中英

如何手動計算分類交叉熵?

[英]How to calculate Categorical Cross-Entropy by hand?

當我手動計算二元交叉熵時,我應用 sigmoid 來獲得概率,然后使用交叉熵公式並平均結果:

logits = tf.constant([-1, -1, 0, 1, 2.])
labels = tf.constant([0, 0, 1, 1, 1.])

probs = tf.nn.sigmoid(logits)
loss = labels * (-tf.math.log(probs)) + (1 - labels) * (-tf.math.log(1 - probs))
print(tf.reduce_mean(loss).numpy()) # 0.35197204

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
loss = cross_entropy(labels, logits)
print(loss.numpy()) # 0.35197204

logitslabels具有不同大小時,如何計算分類交叉熵?

logits = tf.constant([[-3.27133679, -22.6687183, -4.15501118, -5.14916372, -5.94609261,
                       -6.93373299, -5.72364092, -9.75725174, -3.15748906, -4.84012318],
                      [-11.7642536, -45.3370094, -3.17252636, 4.34527206, -17.7164974,
                      -0.595088899, -17.6322937, -2.36941719, -6.82157373, -3.47369862],
                      [-4.55468369, -1.07379043, -3.73261762, -7.08982277, -0.0288562477, 
                       -5.46847963, -0.979336262, -3.03667569, -3.29502845, -2.25880361]])
labels = tf.constant([2, 3, 4])

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,
                                                            reduction='none')
loss = loss_object(labels, logits)
print(loss.numpy()) # [2.0077195  0.00928135 0.6800677 ]
print(tf.reduce_mean(loss).numpy()) # 0.8990229

我的意思是如何手動獲得相同的結果( [2.0077195 0.00928135 0.6800677 ] )?

@OverLordGoldDragon 的答案是正確的。 TF 2.0中,它看起來像這樣:

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')
loss = loss_object(labels, logits)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')

one_hot_labels = tf.one_hot(labels, 10)

preds = tf.nn.softmax(logits)
preds /= tf.math.reduce_sum(preds, axis=-1, keepdims=True)
loss = tf.math.reduce_sum(tf.math.multiply(one_hot_labels, -tf.math.log(preds)), axis=-1)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')
# [2.0077195  0.00928135 0.6800677 ]
# 2.697068691253662
# [2.0077198  0.00928142 0.6800677 ]
# 2.697068929672241

對於語言模型:

vocab_size = 9
seq_len = 6
batch_size = 2

labels = tf.reshape(tf.range(batch_size*seq_len), (batch_size,seq_len)) # (2, 6)
logits = tf.random.normal((batch_size,seq_len,vocab_size)) # (2, 6, 9)

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')
loss = loss_object(labels, logits)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')

one_hot_labels = tf.one_hot(labels, vocab_size)

preds = tf.nn.softmax(logits)
preds /= tf.math.reduce_sum(preds, axis=-1, keepdims=True)
loss = tf.math.reduce_sum(tf.math.multiply(one_hot_labels, -tf.math.log(preds)), axis=-1)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')
# [[1.341706  3.2518263 2.6482694 3.039099  1.5835983 4.3498387]
#  [2.67237   3.3978183 2.8657475       nan       nan       nan]]
# nan
# [[1.341706  3.2518263 2.6482694 3.039099  1.5835984 4.3498387]
#  [2.67237   3.3978183 2.8657475 0.        0.        0.       ]]
# 25.1502742767334

SparseCategoricalCrossentropyCategoricalCrossentropy ,它采用integer標簽而不是one-hot標簽。 來自源代碼的示例,以下兩個是等價的:

scce = tf.keras.losses.SparseCategoricalCrossentropy()
cce = tf.keras.losses.CategoricalCrossentropy()

labels_scce = K.variable([[0, 1, 2]]) 
labels_cce  = K.variable([[1,    0,  0], [0,    1,  0], [0,   0,   1]])
preds       = K.variable([[.90,.05,.05], [.50,.89,.60], [.05,.01,.94]])

loss_cce  = cce(labels_cce,   preds, from_logits=False)
loss_scce = scce(labels_scce, preds, from_logits=False)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run([loss_cce, loss_scce])

print(K.get_value(loss_cce))
print(K.get_value(loss_scce))
# [0.10536055  0.8046684  0.0618754]
# [0.10536055  0.8046684  0.0618754]

至於如何“手工”完成,我們可以參考Numpy 后端

np_labels = K.get_value(labels_cce)
np_preds  = K.get_value(preds)

losses = []
for label, pred in zip(np_labels, np_preds):
    pred /= pred.sum(axis=-1, keepdims=True)
    losses.append(np.sum(label * -np.log(pred), axis=-1, keepdims=False))
print(losses)
# [0.10536055  0.8046684  0.0618754]
  • from_logits = True : preds是 model output傳遞給softmax之前(所以我們將它傳遞給 softmax)
  • from_logits = False : preds是 model output傳遞給softmax (所以我們跳過這一步)

總而言之,手動計算它:

  1. 將 integer 標簽轉換為 one-hot 標簽
  2. 如果 preds 是 softmax之前的 model 輸出,我們計算它們的 softmax
  3. pred /=...在計算日志之前標准化預測; 這樣,高概率。 對零標簽的預測會懲罰對一標簽的正確預測。 如果from_logits = False ,則跳過此步驟,因為softmax會進行歸一化。 請參閱此片段 進一步閱讀
  4. 對於每個觀察/樣本,僅在label==1的情況下計算元素級負log (以 e 為底)
  5. 取所有觀測值的損失平均值

最后,分類交叉熵的數學公式是:

  • i迭代N個觀測值
  • c迭代C
  • 1指標 function - 在這里,類似於二元交叉熵,除了對長度進行操作 - C向量
  • p_model [y_i \in C_c] - 預測的觀察概率i屬於 class c

了解更多信息的參考:

https://missingueverymoment.wordpress.com/2019/10/21/cross-entropy-and-maximum-likelihood-estimation/

交叉熵只是負對數概率的總和。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM