简体   繁体   中英

How to count occurences of word in string that stil works with periods and endings

so I was recently working on this function here:

# counts owls
def owl_count(text):
    # sets all text to lowercase
    text = text.lower()
    
    # sets text to list
    text = text.split()
    
    # saves indices of owl in list
    indices = [i for i, x in enumerate(text) if x == ["owl"] ]
    
    # counts occurences of owl in text
    owl_count = len(indices)
    
    # returns owl count and indices
    return owl_count, indices

My goal was to count how many times "owl" occurs in the string and save the indices of it. The issue I kept running into was that it would not count "owls" or "owl." I tried splitting it into a list of individual characters but I couldn't find a way to search for three consecutive elements in the list. Do you guys have any ideas on what I could do here?

PS. I'm definitely a beginner programmer so this is probably a simple solution.

thanks!

If you don't want to use huge libraries like NLTK, you can filter words that starts with 'owl' , not equal to 'owl' :

indices = [i for i, x in enumerate(text) if x.startswith("owl")]

In this case words like 'owlowlowl' will pass too, but one should use NLTK to properly tokenize words like in real world.

Python has built in functions for these.These types of matching of strings comes under something called Regular Expressions,which you can go into detail later

a_string = "your string"
substring = "substring that you want to check"

matches = re.finditer(substring, a_string)


matches_positions = [match.start() for match in matches]

print(matches_positions)

finditer() will return an iteration object and start() will return the starting index of the found matches.

Simply put,it returns indices of all the substrings in the string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM