標簽[gru] - 堆棧內存溢出

[英]LSTM overfitting problem for all my result. Can someone examine my code for any errors?

感謝您花時間考慮我的問題。我對 LSTM 有問題，因為它讓我對所有結果都過度擬合。我嘗試了不同的技術。有人可以檢查我的代碼，看看我是否寫錯了任何行嗎？ ...

[英]Are torch.nn.ReLU and torch.nn.Sigmoid trainable?

我用 PyTorch 構建了一個簡單的 GRU model。它包括4個子模塊。我注意到其中一些由state_dict()返回的字典在訓練后是空的，而其他子模塊中的一些肯定有一些權重和偏差。編碼：在實際運行中，子模塊self.gru_m和self.output_f的 state_dict 有 ...

如何將（亞麻）GRUCell 的隱藏 state（攜帶）初始化為可學習參數（例如使用 model.init）

[英]How can I initialize the hidden state (carry) of a (flax linen) GRUCell as a learnable parameter (e.g. using model.init)

我使用 Flax 在 Jax 中創建 GRU model 並使用 model.init 初始化 model 參數，如下所示：import jax.numpy as np from jax import random import flax.linen as nn from jax.nn impor ...

如何解釋 Keras GRU 的 get_weights？

[英]How to interpret get_weights for Keras GRU?

我無法從 GRU 層解釋 get_weights 的結果。這是我的代碼 - 我熟悉 GRU 概念。此外，我了解 Keras Simple RNN 層的 get_weights 是如何工作的，其中第一個數組表示輸入權重，第二個表示激活權重，第三個表示偏差。但是，我迷失了 GRU 的輸出，如下 ...

如何為雙向 GRU 獲取序列、隱藏的 state 和單元格 state？

[英]How to get sequence, hidden state and cell state for Bidirectional GRU?

ValueError：沒有足夠的值來解包（預期 5，得到 3）為什么只有3個？它是否在內部連接了前向和后向狀態？那么它是輸出、fwd_h、bwd_h 還是輸出、隱藏狀態、單元狀態？ ...

用coref中的GRU替換雙向LSTM？

[英]Replace bidirectional LSTM with GRU in coref?

我正在使用來自 bert_lstm.jsonnet 的模板配置來訓練 Allennlp 的從粗到細的共指模型（對於英語以外的其他語言）。當我將上下文層的類型“lstm”替換為“gru”時，它可以工作，但似乎對訓練的影響很小。每個 epoch 消耗相同的 63 GB RAM，驗證 f1-score ...