简体   繁体   English

如何从文本中查找肯定和否定单词的总数?

[英]How to find total number of positive and negative words from a text?

I want to find the total number of positive and negative words matched from a given text. 我想查找给定文本中匹配的肯定和否定单词的总数。 I have list of positive words in positive.txt file and list of negative words in negative.txt file. 我在positive.txt文件中有肯定词列表,在negative.txt文件中有否定词列表。 If a word is matched from positive word list, then I want a simple integer variable where the value is incremented by 1, same for the negative matched word. 如果一个单词是从肯定单词列表中匹配的,那么我想要一个简单的整数变量,该变量的值增加1,与否定匹配单词相同。 From my given code I am getting a paragraph which is under @class=[story-hed] . 从我给定的代码中,我得到了一个@class=[story-hed]下的段落。 This is the text which I want to compare with the list of positive and negative words as well as total count of words. 这是我要与肯定和否定单词列表以及单词总数进行比较的文本。 My code is, 我的代码是

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from dawn.items import DawnItem

class dawnSpider(BaseSpider):
   name = "dawn"
   allowed_domains = ["dawn.com"]
   start_urls = [
       "http://dawn.com/"
   ]

   def parse(self, response):

      hxs = HtmlXPathSelector(response)      
      sites = hxs.select('//h3[@class="story-hed"]//a/text()').extract()
      items=[]

      for site in sites:
         item=DawnItem()
         item['title']=site
         items.append(item)
      return items

The standalone code below could do the trick: 下面的独立代码可以达到目的:

from collections import Counter

def readwords( filename ):
    f = open(filename)
    words = [ line.rstrip() for line in f.readlines()]
    return words

positive = readwords('positive.txt')
negative = readwords('negative.txt')

paragraph = 'this is really bad and in fact awesome. really awesome.'

count = Counter(paragraph.split())

pos = 0
neg = 0
for key, val in count.iteritems():
    key = key.rstrip('.,?!\n') # removing possible punctuation signs
    if key in positive:
        pos += val
    if key in negative:
        neg += val

print pos, neg

Here is what I have in the two input files: 这是两个输入文件中的内容:

positive.txt: positive.txt:

good 
awesome

negative.txt: negative.txt:

bad
ugly

and the output is: 2 1 输出为:2 1

To implement this in scrapy, you might want to use an item pipeline http://doc.scrapy.org/en/latest/topics/item-pipeline.html 要在草率地实现这一点,您可能需要使用项目管道http://doc.scrapy.org/en/latest/topics/item-pipeline.html

First you may want to read the files. 首先,您可能需要阅读文件。 Assuming you have a word per line you can read all the words with the following code: 假设每行有一个单词,则可以使用以下代码读取所有单词:

postive = [l.strip() for l in open("possitive.txt")]

Once done, you can create a dict which will hold the word as key and the count as value. 完成后,您可以创建一个字典,将单词作为键,将计数作为值。 For initiating the dict to zero you can use: 要将dict初始化为零,可以使用:

positive_count = dict.fromkeys(postive, 0)

Finally you hust iterate all the items and increment the count if world is found: 最后,如果发现世界,则必须迭代所有项并增加计数:

for item in items:
    if item in positive_count:
         postive_count[item] +=1

And finally you can print the results with: 最后,您可以使用以下命令打印结果:

for item, value in postive_counts.iteritems():
    print "Word %s count %d" % (item, value)

For negative will be the same, just ommited to simplify the answer. 对于否定将是相同的,只是省略了简化的答案。

This depends on the size of the word lists. 这取决于单词列表的大小。 If they are smallish (less than a few kb), then read them into a list: 如果它们很小(少于几个kb),则将它们读入列表:

with open(positive_wordlist_file_name) as fd:
  positive_words = [line.strip() for line in fd]

Once you have two word lists, you can then got through the text with them - line by line if you can. 一旦有了两个单词列表,就可以与它们一起遍历文本(如果可以的话)。 Split those into words, and then use the "in" operator to check them in the list. 将其拆分为单词,然后使用“ in”运算符在列表中进行检查。 I'd use a couple of co-routines in a class for it: 我会在一个类中使用几个协同例程:

class WordCounter:
  # You can probably read word lists and store them here
  def positive_word_counter(self):
    """Co-routine that will count positive words. I'll leave it to reader
    to make a similar negative word one"""
    self.positive_words = 0
    while True:
      words = yield
      matched = [word for word in words if word in self.positive_words]
      self.positive_words += len(matched)

  def read_text(text):
    """Text - some iterable of lines - an file handle, or list or whatever."""
    #expand on this split with other word separators - or use re.split with the word boundary instead
    line_words = (line.strip().split(' ,') for line in text)
    #Create and prime coroutines
    positive_counter = self.positive_word_counter()
    positive_counter.next()
    negative_counter = self.negative_word_counter()
    negative_counter.next()
    #Now fire it in
    [[positive_counter.next(words), negative_counter.next(words)] for words in line_words]
    #You should now be able to read positive/negative words from this object

for key, val in count.iteritems(): ==>only it works in Python 3 below version if you're using python 3 above versions use for key, val in count.iteritems(): ==>仅在使用Python 3以上版本的情况下,它才在Python 3以下版本中工作

for key, val in count.item()
    key = key.rstrip('.,?!\n') # removing possible punctuation signs
    if key in positive:
        pos += val
    if key in negative:
        neg += val

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从文本中提取正面和负面的词? - Extract positive and negative words from text? 如何从没有积极或消极情绪的句子中删除单词? - How to remove words from a sentence that carry no positive or negative sentiment? 如何在 numpy 数组中找到最大负数和最小正数? - How to find maximum negative and minimum positive number in a numpy array? 如何根据正面和负面关键字的数量对熊猫数据框中的文本进行分类 - How to categorize text in a pandas dataframe based on the number of positive and negative keywords 如何将负数转换为正数? - How to convert a negative number to positive? 查找总数为负的用户数 - Find the number of users with negative total 从 python 中的字符串中过滤正负词 - Filter positive and negative words from a string in python 如何获得正数的负二进制数 - How to get the negative binary number of a positive number 如何通过使用python在excel中查找指定数据(正数首先出现而负数在附近) - How to find specified data in excel by using python (positive number first appeared and negative number nearby) 如果数字是正数还是负数(包括0),如何返回1或-1? - How to return 1 or -1 if number is positive or negative (including 0)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM