![](/img/trans.png)
[英]What is the difference between the TensorFlow batch normalization implementations?
[英]Dimensional Difference between Running mean and Sample mean in Batch normalization
我最近通過 cs231n 在線自學,在批量歸一化分配中,特別是運行均值計算:
running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_mean
由
running_mean = bn_param.get("running_mean", np.zeros(D, dtype=x.dtype))
。
所以當你有多個batchnorm層時, running_mean
值繼承自最后一個batchnorm層,但sample_mean
是當前層輸入獲得的,這導致
File ~/assignment/assignment2/cs231n/layers.py:217, in batchnorm_forward(x, gamma, beta, bn_param)
213 out = x_hat * gamma + beta
215 print(running_mean.shape, miu.shape)
--> 217 running_mean = momentum * running_mean + (1 - momentum) * miu
218 running_var = momentum * running_var + (1 - momentum) * sigma_squared
220 cache = miu, sigma_squared, eps, N, x_hat, x, gamma
ValueError: operands could not be broadcast together with shapes (1,20) (1,30)
我在這里錯過了什么? 推導似乎是正確的
我嘗試實現 batchnorm 層,但 running_mean 和 sample_mean 的維度不同。
這就是我所擁有的:
miu = np.mean(x, axis=0)
var = np.var(x, axis=0)
x_hat = (x - miu) / np.sqrt(var + eps)
out = x_hat * gamma + beta
print(running_mean.shape, miu.shape)
running_mean = momentum * running_mean + (1 - momentum) * miu
running_var = momentum * running_var + (1 - momentum) * var
cache = miu, var, eps, N, x_hat, x, gamma
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.