简体   繁体   English

使用fasttext预训练的单词向量作为在tensorflow脚本中的嵌入

[英]Use of fasttext Pre-trained word vector as embedding in tensorflow script

我可以使用像这里的快速文字向量: https//github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md在tensorflow脚本中作为嵌入向量而不是word2vec或手套而不使用库fasttext

When you use pre-trained word vector, you can use gensim libarary. 当您使用预先训练的单词向量时,您可以使用gensim libarary。

For your reference. 供你参考。 https://blog.manash.me/how-to-use-pre-trained-word-vectors-from-facebooks-fasttext-a71e6d55f27 https://blog.manash.me/how-to-use-pre-trained-word-vectors-from-facebooks-fasttext-a71e6d55f27

In [1]: from gensim.models import KeyedVectors

In [2]: jp_model = KeyedVectors.load_word2vec_format('wiki.ja.vec')

In [3]: jp_model.most_similar('car')
Out[3]: 
[('cab', 0.9970724582672119),
 ('tle', 0.9969051480293274),
 ('oyc', 0.99671471118927),
 ('oyt', 0.996662974357605),
 ('車', 0.99665766954422),
 ('s', 0.9966464638710022),
 ('新車', 0.9966358542442322),
 ('hice', 0.9966053366661072),
 ('otg', 0.9965877532958984),
 ('車両', 0.9965814352035522)]

EDIT 编辑

I created a new branch forked from cnn-text-classification-tf . 我创建了一个从cnn-text-classification-tf分叉的新分支。 Here is the link. 链接在这里。 https://github.com/satojkovic/cnn-text-classification-tf/tree/use_fasttext https://github.com/satojkovic/cnn-text-classification-tf/tree/use_fasttext

In this branch, there are three modifications to use fasttext. 在这个分支中,使用fasttext有三个修改。

  1. Extract the vocab and the word_vec from fasttext. 从fasttext中提取词汇和word_vec。 (util_fasttext.py) (util_fasttext.py)
model = KeyedVectors.load_word2vec_format('wiki.en.vec')
vocab = model.vocab
embeddings = np.array([model.word_vec(k) for k in vocab.keys()])

with open('fasttext_vocab_en.dat', 'wb') as fw:
    pickle.dump(vocab, fw, protocol=pickle.HIGHEST_PROTOCOL)
np.save('fasttext_embedding_en.npy', embeddings)
  1. Embedding layer 嵌入图层

    W is initialized by zeros, and then an embedding_placeholder is set up to receive the word_vec, and finally W is assigned. W由零初始化,然后设置embedding_placeholder以接收word_vec,最后分配W。 (text_cnn.py) (text_cnn.py)

W_ = tf.Variable(
    tf.constant(0.0, shape=[vocab_size, embedding_size]),
    trainable=False,
    name='W')

self.embedding_placeholder = tf.placeholder(
    tf.float32, [vocab_size, embedding_size],
    name='pre_trained')

W = tf.assign(W_, self.embedding_placeholder)
  1. Use the vocab and the word_vec 使用词汇和word_vec

    The vocab is used to build the word-id maps, and the word_vec is fed into the embedding_placeholder. 词汇用于构建word-id映射,word_vec被输入embedding_placeholder。

with open('fasttext_vocab_en.dat', 'rb') as fr:
    vocab = pickle.load(fr)
embedding = np.load('fasttext_embedding_en.npy')

pretrain = vocab_processor.fit(vocab.keys())
x = np.array(list(vocab_processor.transform(x_text)))
feed_dict = {
    cnn.input_x: x_batch,
    cnn.input_y: y_batch,
    cnn.dropout_keep_prob: FLAGS.dropout_keep_prob,
    cnn.embedding_placeholder: embedding
}

Please try it out. 请试一试。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 FastText 中使用预先训练好的词向量? - How to use pre-trained word vectors in FastText? 在 Keras 中使用 fasttext 预训练模型作为嵌入层 - Using fasttext pre-trained models as an Embedding layer in Keras 如何在Keras中将自己的词嵌入与像word2vec这样的预训练嵌入一起使用 - How to use own word embedding with pre-trained embedding like word2vec in Keras 在 TensorFlow 中使用预训练的词嵌入(word2vec 或 Glove) - Using a pre-trained word embedding (word2vec or Glove) in TensorFlow 如何在TensorFlow中使用预训练模型 - How to use pre-trained model in TensorFlow 如何在交互模式下使用 Elmo 词嵌入与原始预训练模型(5.5B) - How to use Elmo word embedding with the original pre-trained model (5.5B) in interactive mode 使用预训练词向量为长文档生成嵌入 - Generating embedding for long documents using pre-trained word vectors 如何使用gensim使用deepset的word embedding预训练模型? - How to use deepset's word embedding pre-trained models using gensim? 如何使用 Wiki:Fasttext.vec 和 Google News:Word2vec.bin 预训练文件作为 Keras 嵌入层的权重 - How to use Wiki: Fasttext.vec and Google News: Word2vec.bin pre trained files as weights for Keras Embedding layer 上传预训练的西班牙语词向量,然后用自定义句子重新训练? (GENSIM -FASTTEXT) - Upload a pre-trained spanish language word vectors and then retrain it with custom sentences? (GENSIM -FASTTEXT)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM