简体   繁体   中英

Occurence of a string in a text python

There are numerous posts about the occurence of a substring in python, but I can't find anything about the occurrence of a string in a text.

testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"

#Suppose my search term is a, then I would expect the output of my program to be:
print testSTR.myfunc("a")
>>1

Since there is only 1 concrete reference to the string "a" in the entire input. count() won't do since it counts substrings as well, so the output I get is:

print testSTR.count()
>>3

Can something like this be done?

You can use collections to do it after splitting the string.

from collections import Counter
print Counter(testSTR.split())

The output would look like

Counter({'you': 2, 'a': 1, 'and': 1, 'words': 1, 'text': 1, 'some': 1, 'the': 1, 'large': 1, 'to': 1, 'Suppose': 1, 'are': 1, 'have': 1, 'of': 1, 'specific': 1, 'trying': 1, 'find': 1, 'occurences': 1})

To get the count of a specific substring a use,

from collections import Counter
res = Counter(testSTR.split())
print res['a']

If the count needs to be case-insensitive, convert the substrings using upper() or lower before counting.

res= Counter(i.lower() for i in testSTR.split())

I think the most straightforward way is to use regular expressions:

import re
testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"

print len(re.findall(r"\ba\b", testSTR))
# 1

\\ba\\b checks for a "word boundary" both before and after a , where a "word boundary" is punctuation, a space, or the beginning or end of the whole string. This is more useful than just splitting on whitespace, unless that's what you want of course...

import re
str2 = "a large text a, a. a"

print len(re.findall(r"\ba\b", str2))
# 4

If you are concerned about punctuation, you should try this:

words = testSTR.split().map(lambda s: s.strip(".!?:;,\"'"))
print "a" in words

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM