简体   繁体   中英

How do I extract an exact word from text by using python?

I want to calculate how many lines contain a word matched with keywords I chosen. So I coded like this.

  28         for each_keyword in keywords:
  29             if each_keyword in text:
  31                 related_tweet_count += 1
  32                 print "related_tweet_count", related_tweet_count
  33                 print text

It performed very well. But it has a problem. For example, I have a keyword "flu" then it gives not only "flu" but also "influence". To solve this problem, I searched match word examples and fixed the code like this.

  28         for each_keyword in keywords:
  30             if re.search('\beach_keyword\b', text, re.I):
  31                 related_tweet_count += 1
  32                 print "related_tweet_count", related_tweet_count
  33                 print text

But it doesn't work. Please help me out!

You need to actually substitute each_keyword into the regular expression. At the moment it's literally trying to match "each_keyword".

28         for each_keyword in keywords:
30             if re.search('\\b' + each_keyword + '\\b', text, re.I):
31                 related_tweet_count += 1
32                 print "related_tweet_count", related_tweet_count
33                 print text

Alternatively do it without regular expressions and use more kw variations,

for keyword in keywords:
    kw_list = [' '+keyword+',',' '+keyword+' ',' '+keyword+'.','. '+keyword]
    for kw in kw_list:
        if kw in text:
            related_tweet_count += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM