繁体   English   中英

如何用FastText训练机器学习 model output

[英]How to train machine learning model with FastText output

是否有 Fasttext 的任何方法,我可以通过它从 Fasttext 的以下 output 或任何我可以训练我的 ML model 的方法。自从我使用 TF-IDF 之后我得到了稀疏矩阵并且我训练了 ML model 但现在我想用 FastText 训练 model。

fasttext_out=model_ted.wv.most_similar("The Lemon Drop Kid , a New York City swindler, is illegally touting horses at a Florida racetrack. After several successful hustles, the Kid comes across a beautiful, but gullible, woman intending to bet a lot of money. The Kid convinces her to switch her bet, employing a prefabricated con. Unfortunately for the Kid, the woman belongs to notorious gangster Moose Moran , as does the money. The Kid's choice finishes dead last and a furious Moran demands the Kid provide him with $10,000  by Christmas Eve, or the Kid won't make it to New Year's. The Kid decides to return to New York to try to come up with the money. He first tries his on-again, off-again girlfriend Brainy Baxter . However, when talk of long-term commitment arises, the Kid quickly makes an escape.")

model_ted.wv.most_similar("school")

Output:

[('Psycho-biddy', 0.9323669672012329),
 ('Slasher', 0.8850599527359009),
 ('Demonic child', 0.8805997967720032),
 ('Giallo', 0.8504119515419006),
 ('Road-Horror', 0.821454644203186),
 ('Anthology', 0.8191317915916443),
 ('Czechoslovak New Wave', 0.8187490105628967),
 ('Supernatural', 0.813347339630127),
 ('Psychological thriller', 0.8018383979797363),
 ('Kitchen sink realism', 0.8017964959144592)]

我的主要目的是将 output 转换为向量并训练机器学习 model。请确认。

我对您之前类似问题回答仍然适用,具体而言:

FastText本质上只为您提供词向量:每个词一个向量。 如果您想要一个用于较长文本运行的向量,比如很多单词,您需要做出更多决定,以决定如何将一堆单独的单词向量转换成其他东西。

简单地尝试是一个不错的决定:将所有这些词平均在一起。 (还有许多其他方法可以将较大的文本表示为向量或其他值袋。)

然后,您可以尝试将这些平均值作为特征传递给下游分类器。

另外,正如之前的答案中也指出的那样,如果您像示例代码所示那样传递一个长字符串,您将不会获得一组有意义的.most_similar()结果。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM