简体   繁体   中英

Text classification using Word2Vec

I am in trouble to understand Word2Vec. I need to do a help desk text classification, based on what users complain in the help desk system. Each sentence has its own class.

I've seen some pre-trained word2vec files in the internet, but I don't know if is the best way to work since my problem is very specific. And my dataset is in Portuguese.

I'm considering that I will have to create my own model and I am in doubt on how to do that. Do I have to do it with the same words as the dataset I have with my sentences and classes?

In the frst line, the column titles. Below the first line, I have the sentence and the class. Could anyone help me? I saw Gensin to create vector models, and sounds me good. But I am completely lost.

: chamado,classe 'Prezados não estou conseguindo gerar uma nota fiscal do módulo de estoque e custos.','ERP GESTÃO', 'Não consigo acessar o ERP com meu usuário e senha.','ERP GESTÃO', 'Médico não consegue gerar receituário no módulo de Medicina e segurança do trabalho.','ERP GESTÃO', 'O produto 4589658 tinta holográfica não está disponível no EIC e não consigo gerar a PO.','ERP GESTÃO',

Your inquiry is very general, and normally StackOverflow will be more able to help when you've tried specific things, and hit specific problems - so that you can provide exact code, errors, or shortfalls to ask about.

But in general:

  • You might not need word2vec at all: there are many text-classification approaches that, with sufficient training data, may assign your texts to helpful classes without using word-vectors. You will likely want to try those first, then consider word-vectors as a later improvement.

  • For word-vectors to be helpful, they need to be based on your actual language, and also ideally your particular domain-of-concern. Generic word-vectors from news articles or even Wikipedia may not include the important lingo, and word-senses for your problem. But it's not too hard to train your own word-vectors – you just need a lot of varied, relevant texts that use the words in realistic, relevant contexts. So yes, you'd ideally train your word-vectors on the same texts you eventually want to classify.

But mostly, if you're "totally lost", start with more simple text-classification examples. As you're using Python, examples based on scikit-learn may be most relevant. Adapt those to your data & goals, to familiarize yourself with all the steps & the ways of evaluating whether your changes are improving your end results or not. Then investigate techniques like word-vectors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM