简体   繁体   English

如何用FastText训练机器学习 model output

[英]How to train machine learning model with FastText output

Is there any method of Fasttext by which I can get the following format (<1x10000 sparse matrix of type '<class 'numpy.float64'>'with 67 stored elements in Compressed Sparse Row format>) from the below output of Fasttext or any method by which I can train my ML model. Since when I used TF-IDF then I get the sparse matrix and I trained the ML model but now I want to train the model with FastText.是否有 Fasttext 的任何方法,我可以通过它从 Fasttext 的以下 output 或任何我可以训练我的 ML model 的方法。自从我使用 TF-IDF 之后我得到了稀疏矩阵并且我训练了 ML model 但现在我想用 FastText 训练 model。

fasttext_out=model_ted.wv.most_similar("The Lemon Drop Kid , a New York City swindler, is illegally touting horses at a Florida racetrack. After several successful hustles, the Kid comes across a beautiful, but gullible, woman intending to bet a lot of money. The Kid convinces her to switch her bet, employing a prefabricated con. Unfortunately for the Kid, the woman belongs to notorious gangster Moose Moran , as does the money. The Kid's choice finishes dead last and a furious Moran demands the Kid provide him with $10,000  by Christmas Eve, or the Kid won't make it to New Year's. The Kid decides to return to New York to try to come up with the money. He first tries his on-again, off-again girlfriend Brainy Baxter . However, when talk of long-term commitment arises, the Kid quickly makes an escape.")

model_ted.wv.most_similar("school")

Output: Output:

[('Psycho-biddy', 0.9323669672012329),
 ('Slasher', 0.8850599527359009),
 ('Demonic child', 0.8805997967720032),
 ('Giallo', 0.8504119515419006),
 ('Road-Horror', 0.821454644203186),
 ('Anthology', 0.8191317915916443),
 ('Czechoslovak New Wave', 0.8187490105628967),
 ('Supernatural', 0.813347339630127),
 ('Psychological thriller', 0.8018383979797363),
 ('Kitchen sink realism', 0.8017964959144592)]

My main intention is to change the output into vectors and train the Machine Learning model. Please confirm.我的主要目的是将 output 转换为向量并训练机器学习 model。请确认。

My answer to your previous similar question still applies, specifically:我对您之前类似问题回答仍然适用,具体而言:

FastText inherently only gives you word-vectors: a vector per word. FastText本质上只为您提供词向量:每个词一个向量。 If you want a vector for a longer run of text, like a lot of words, you'll need to make more decisions about how you want to turn a bunch of individual word-vectors into something else.如果您想要一个用于较长文本运行的向量,比如很多单词,您需要做出更多决定,以决定如何将一堆单独的单词向量转换成其他东西。

It's an OK decision to simply try: averaging all those words together.简单地尝试是一个不错的决定:将所有这些词平均在一起。 (There are many other ways to represent larger texts as vectors, or bags-of-other-values.) (还有许多其他方法可以将较大的文本表示为向量或其他值袋。)

You could then try passing those averages as the features to a downstream classifier.然后,您可以尝试将这些平均值作为特征传递给下游分类器。

Separately, as also pointed out in that prior answer, you won't get a meaningful set of .most_similar() results if you pass a long string like your example code shows.另外,正如之前的答案中也指出的那样,如果您像示例代码所示那样传递一个长字符串,您将不会获得一组有意义的.most_similar()结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM