[英]Pytorch equivalent of tensorflow keras StringLookup?
我現在正在使用 pytorch,但我缺少一個層: tf.keras.layers.StringLookup
,它有助於處理 ids。 有什么解決方法可以用 pytorch 做類似的事情嗎?
我正在尋找的功能示例:
vocab = ["a", "b", "c", "d"]
data = tf.constant([["a", "c", "d"], ["d", "a", "b"]])
layer = tf.keras.layers.StringLookup(vocabulary=vocab)
layer(data)
Outputs:
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[1, 3, 4],
[4, 1, 2]])>
您可以使用Collections.Counter
和torchtext
的vocab
對象從您的詞匯表中構建查找功能。 然后,您可以輕松地將序列傳遞給它,並將它們的編碼作為張量:
from torchtext.vocab import vocab
from collections import Counter
tokens = ["a", "b", "c", "d"]
samples = [["a", "c", "d"], ["d", "a", "b"]]
# Build string lookup
lookup = vocab(Counter(tokens))
>>> torch.tensor([lookup(s) for s in samples])
tensor([[0, 2, 3],
[3, 0, 1]])
您可以使用庫 torchtext,只需使用 python3 -m pip install torchtext 安裝它
你可以是這樣的:
from torchtext.vocab import vocab
from collections import OrderedDict
tokens = ['a', 'b', 'c', 'd']
v1 = vocab(OrderedDict([(token, 1) for token in tokens]))
v1.lookup_indices(["a","b","c"])
這是結果:
([0, 1, 2],)
包 torchnlp,
pip install pytorch-nlp
from torchnlp.encoders import LabelEncoder
data = ["a", "c", "d", "e", "d"]
encoder = LabelEncoder(data, reserved_labels=['unknown'], unknown_index=0)
enl = [encoder.encode(x) for x in data]
print(enl)
[tensor(1), tensor(2), tensor(3), tensor(4), tensor(3)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.