简体繁体中英

How can I recover the likelihood of a certain word appearing in a given context from word embeddings?

原文 2019-09-23 09:40:18 9 1 nlp/ word-embedding/ word-sense-disambiguation

I know that some methods of generating word embeddings (eg CBOW) are based on predicting the likelihood of a given word appearing in a given context. I'm working with polish language, which is sometimes ambiguous with respect to segmentation, eg 'Coś' can be either treated as one word, or two words which have been conjoined ('Co' + '-ś') depending on the context. What I want to do, is create a tokenizer which is context sensitive. Assuming that I have the vector representation of the preceding context, and all possible segmentations, could I somehow calculate, or approximate the likelihood of particular words appearing in this context?

1 answers

This very much depends on the way how you got your embeddings. The CBOW model has two parameters the embedding matrix that is denoted v and the output projection matrix v' . If you want to recover the probabilities that are used in the CBOW model at training time, you need to get v' as well. See equation (2) in the word2vec paper . Tools for pre-computing word embeddings usually don't do that, so you would need to modify them yourself.

Anyway, if you want to compute a probability of a word, given a context, you should rather think about using a (neural) language model than a table of word embeddings. If you search the Internet, I am sure you will find something that suits your needs.

How can I get RoBERTa word embeddings?

Word2vec word embeddings: how to have different embeddings to different words coming in same context?

How to get word embeddings back from Keras?

Do BERT word embeddings change depending on context?

How word embeddings work for word similarity?

How to normalize word embeddings (word2vec)

How to store Word vector Embeddings?

How to interpret CBOW word embeddings?

Concatenate char embeddings and word embeddings

Am I using word-embeddings correctly?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How can I get RoBERTa word embeddings? Word2vec word embeddings: how to have different embeddings to different words coming in same context? How to get word embeddings back from Keras? Do BERT word embeddings change depending on context? How word embeddings work for word similarity? How to normalize word embeddings (word2vec) How to store Word vector Embeddings? How to interpret CBOW word embeddings? Concatenate char embeddings and word embeddings Am I using word-embeddings correctly?

Related Tags

How can I recover the likelihood of a certain word appearing in a given context from word embeddings?

Question

1 answers

solution1 0 2019-09-23 10:52:13

solution1
0 2019-09-23 10:52:13