Pytorch (1.0) 中類似操作的不同`grad_fn`

Question

我正在研究一個注意力模型，在運行最終模型之前，我正在研究流經代碼的張量形狀。 我有一項手術需要重塑張量。 張量的形狀torch.Size([[30, 8, 9, 64]])其中30是batch_size ， 8是注意力頭的數量（這與我的問題無關） 9是單詞的數量在句子中， 64是單詞的一些中間嵌入表示。 在進一步處理之前，我必須將張量重塑為torch.size([30, 9, 512])的大小。 所以我在網上查看了一些參考資料，他們完成了以下x.transpose(1, 2).contiguous().view(30, -1, 512)而我認為這應該可以工作x.transpose(1, 2).reshape(30, -1, 512) 。

在第一種情況下， grad_fn是<ViewBackward> ，而在我的情況下是<UnsafeViewBackward> 。 這兩個不是同一個操作嗎？ 這會導致訓練錯誤嗎？

Answer 1

這兩個不是同一個操作嗎？

不。雖然它們有效地產生相同的張量，但操作並不相同，並且不能保證它們具有相同的storage 。

張量形狀.cpp ：

// _unsafe_view() differs from view() in that the returned tensor isn't treated
// as a view for the purposes of automatic differentiation. (It's not listed in
// VIEW_FUNCTIONS in gen_autograd.py).  It's only safe to use if the `self` tensor
// is temporary. For example, the viewed tensor here (a + b) is discarded immediately
// after viewing:
//
//  res = at::_unsafe_view(a + b, size);
//
// This is a hack because in-place operations on tensors treated like views
// can be much more expensive than the same operations on non-view tensors.

請注意，如果應用於復雜的輸入，這可能會產生錯誤，但這在 PyTorch 中通常還沒有完全支持，並且不是此功能獨有的。

Pytorch (1.0) 中類似操作的不同`grad_fn`

問題描述

1 個解決方案

解決方案1
1 2021-05-03 14:53:16

Pytorch (1.0) 中類似操作的不同`grad_fn`

問題描述

1 個解決方案

解決方案1 1 2021-05-03 14:53:16

解決方案1
1 2021-05-03 14:53:16