简体繁体中英

Text classification using Word2Vec

原文 2020-06-18 18:48:51 4 1 python/ word2vec

I am in trouble to understand Word2Vec. I need to do a help desk text classification, based on what users complain in the help desk system. Each sentence has its own class.

I've seen some pre-trained word2vec files in the internet, but I don't know if is the best way to work since my problem is very specific. And my dataset is in Portuguese.

I'm considering that I will have to create my own model and I am in doubt on how to do that. Do I have to do it with the same words as the dataset I have with my sentences and classes?

In the frst line, the column titles. Below the first line, I have the sentence and the class. Could anyone help me? I saw Gensin to create vector models, and sounds me good. But I am completely lost.

: chamado,classe 'Prezados não estou conseguindo gerar uma nota fiscal do módulo de estoque e custos.','ERP GESTÃO', 'Não consigo acessar o ERP com meu usuário e senha.','ERP GESTÃO', 'Médico não consegue gerar receituário no módulo de Medicina e segurança do trabalho.','ERP GESTÃO', 'O produto 4589658 tinta holográfica não está disponível no EIC e não consigo gerar a PO.','ERP GESTÃO',

1 answers

Your inquiry is very general, and normally StackOverflow will be more able to help when you've tried specific things, and hit specific problems - so that you can provide exact code, errors, or shortfalls to ask about.

But in general:

You might not need word2vec at all: there are many text-classification approaches that, with sufficient training data, may assign your texts to helpful classes without using word-vectors. You will likely want to try those first, then consider word-vectors as a later improvement.
For word-vectors to be helpful, they need to be based on your actual language, and also ideally your particular domain-of-concern. Generic word-vectors from news articles or even Wikipedia may not include the important lingo, and word-senses for your problem. But it's not too hard to train your own word-vectors – you just need a lot of varied, relevant texts that use the words in realistic, relevant contexts. So yes, you'd ideally train your word-vectors on the same texts you eventually want to classify.

But mostly, if you're "totally lost", start with more simple text-classification examples. As you're using Python, examples based on scikit-learn may be most relevant. Adapt those to your data & goals, to familiarize yourself with all the steps & the ways of evaluating whether your changes are improving your end results or not. Then investigate techniques like word-vectors.

Text classification with word2vec stack overflow tag predictor

Word2vec with Conv1D for text classification confusion

How to fix (do better) text classification model with using word2vec

Text similarity using Word2Vec

Classification accuracy is too low (Word2Vec)

Multilabel for text with word2vec

6 GB RAM Fails in Vectorizing text using Word2Vec

Sklearn+Gensim: How to use Gensim's Word2Vec embedding for Sklearn text classification

Using Word2Vec for word embedding of sentences

saving word2vec in text format

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Text classification with word2vec stack overflow tag predictor Word2vec with Conv1D for text classification confusion How to fix (do better) text classification model with using word2vec Text similarity using Word2Vec Classification accuracy is too low (Word2Vec) Multilabel for text with word2vec 6 GB RAM Fails in Vectorizing text using Word2Vec Sklearn+Gensim: How to use Gensim's Word2Vec embedding for Sklearn text classification Using Word2Vec for word embedding of sentences saving word2vec in text format

Related Tags

Text classification using Word2Vec

Question

1 answers

solution1 0 2020-06-18 19:41:34

solution1
0 2020-06-18 19:41:34