简体   繁体   English

在大文件中对否定词和肯定词进行分类?

[英]Classifying negative and positive words in large files?

I am trying to get the count of positive and negative in a very large file. 我试图在一个非常大的文件中获得正数和负数的计数。 I only need a primitive approach(that does not take ages). 我只需要一个原始的方法(不需要花很多时间)。 I have tried sentiwordnet but keep getting a IndexError: list index out of range , which I think it's due to the words not being listed in wordnet dictionary. 我尝试了sendiwordnet,但始终收到IndexError: list index out of range ,这是由于单词未在wordnet词典中列出。 The text contains a lot of typos and 'non-words'. 文本中包含很多错别字和“非单词”。

If someone could give any suggestion, I would be very grateful! 如果有人可以提出任何建议,我将不胜感激!

It all depends on what your data is like and what is the final objective of your task. 这完全取决于您的数据是什么样的以及任务的最终目标是什么。 You need to give us a little bit more detailed description of your project but, in general, here are your options: - Make your own sentiment analysis dictionary: I really doubt this is what you want to do since it takes a lots of time and effort but if your data is simple enough it's doable. 您需要给我们一些有关您的项目的更详细的描述,但是总的来说,您可以选择以下选项:-编写自己的情感分析词典:我真的怀疑这是您要执行的操作,因为这需要花费大量时间,并且努力,但是如果您的数据足够简单,那是可行的。 - Clean your data: if your tokens aren't in senti-wordnet because there's too much noise and badly spelled words, then try to correct them before passing them through wordnet, it will at least limit the number of errors you'll get. -清理数据:如果由于噪音过多和拼写错误的单词而使令牌不在senti-wordnet中,请在将其通过wordnet之前尝试对其进行更正,这将至少限制您将获得的错误数量。 - Use a senti-wordnet alternative: accorded, there aren't that many good ones but you can always try sentiment_classifier or nltk's sentiment if you're using python (which by the looks of your error seems like you are). -使用senti,共发现可供选择:符合,有没有那么多好的,但你可以尝试sentiment_classifierNLTK的情绪 ,如果你使用python(由你的错误看起来好像你是)。 - Classify only what you can: this is what I would recommend. -只对您可以进行的分类:这是我的建议。 If the word is not in senti-wordnet, then move on to the next one. 如果该单词不在senti-wordnet中,请继续进行下一个。 Just catch the error ( try: ... except IndexError: pass ) and try to infer what the general sentiment of the data is by counting the sentiment words you actually catch. 只需捕获错误( try: ... except IndexError: pass ),然后通过计算您实际捕获的情感词来推断数据的总体情感是什么。

PS: We would need to see your code to be sure but I think there's another reason why you're getting an IndexError. PS:我们需要确定您的代码,但是我认为还有另一个原因导致您收到IndexError。 If the word was not in senti-wordnet you would be getting a KeyError, but it also depends on how you coded your function. 如果单词不在sendi-wordnet中,则将得到KeyError,但这还取决于您对函数进行编码的方式。

Good luck and I hope it was helpful. 祝您好运,希望对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 NLTK添加自定义的否定词和肯定词 - NLTK add custom negative and positive words 如何分析文本中的肯定或否定词? - How can I analyze pieces of text for positive or negative words? 如何从没有积极或消极情绪的句子中删除单词? - How to remove words from a sentence that carry no positive or negative sentiment? 推文分析:获取独特的正面、独特的负面和独特的中性词:优化解决方案:自然语言处理: - Tweets analysis: Get unique positive, unique negative and unique neutral words : Optimised solution:Natural Language processing: 停用词将负面评论变为正面评论。 在文本摘要过程中删除停用词的好方法是什么? - Stop words changed negative review to positive ones. What is a good way to remove stop words in text summarization process? 在负采样中排除正样本 - Excluding positive samples in Negative Sampling 区分肯定句和否定句 - differentiate between positive and negative sentence 如何将标记标记为正面或负面 - How to label the tokens as positive or negative 检测单词中的错误并在分类文本 (NLP) 时修复它们 - Detecting mistakes in words and fix them when classifying text (NLP) 用于确定语句/文本的正面或负面的算法 - Algorithm to determine how positive or negative a statement/text is
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM