[英]Could I use BERT to Cluster phrases with pre-trained model
I found it was a failure that I had used Gensim with GoogleNews pre-trained model to cluster phrases like: 我发现将Gensim与GoogleNews的预训练模型结合使用来对短语进行聚类是失败的:
I am advised that GoogleNews model does't have the phrases in it . 我被告知GoogleNews模型中没有短语 。 The phrases I have are a little specific to GoogleNews model while I don't have corpus to train a new model.
我所用的短语是GoogleNews模型所特有的,而我没有语料来训练新模型。 I have only the phrases.
我只有这些短语。 And now I am considering to turn to BERT.
现在我正在考虑求助于BERT。 But could BERT do that as I expected as above?
但是BERT可以像我上面期望的那样做吗? Thank you.
谢谢。
You can feed a phrase into the pretrained BERT model and get an embedding, ie a fixed-dimension vector. 您可以将短语输入到预训练的BERT模型中,并获得嵌入,即固定维向量。 So BERT can embed your phrases in a space.
因此,BERT可以将您的短语嵌入空格中。 Then you can use a clustering algorithm (such as k-means) to cluster the phrases.
然后,您可以使用聚类算法(例如k-means)对短语进行聚类。 The phrases do not need to occur in the training corpus of BERT, as long as the words they consist of are in the vocabulary.
短语不需要在BERT的训练语料中出现,只要它们组成的单词在词汇表中即可。 You will have to try to see if the embeddings give you relevant results.
您将不得不尝试查看嵌入是否为您提供了相关的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.