简体   繁体   English

如何计算字符串中仍然使用句点和结尾的单词的出现次数

[英]How to count occurences of word in string that stil works with periods and endings

so I was recently working on this function here:所以我最近在这里研究这个 function:

# counts owls
def owl_count(text):
    # sets all text to lowercase
    text = text.lower()
    
    # sets text to list
    text = text.split()
    
    # saves indices of owl in list
    indices = [i for i, x in enumerate(text) if x == ["owl"] ]
    
    # counts occurences of owl in text
    owl_count = len(indices)
    
    # returns owl count and indices
    return owl_count, indices

My goal was to count how many times "owl" occurs in the string and save the indices of it.我的目标是计算字符串中出现“owl”的次数并保存它的索引。 The issue I kept running into was that it would not count "owls" or "owl."我一直遇到的问题是它不会计算“猫头鹰”或“猫头鹰”。 I tried splitting it into a list of individual characters but I couldn't find a way to search for three consecutive elements in the list.我尝试将其拆分为单个字符列表,但找不到在列表中搜索三个连续元素的方法。 Do you guys have any ideas on what I could do here?你们对我可以在这里做什么有什么想法吗?

PS. PS。 I'm definitely a beginner programmer so this is probably a simple solution.我绝对是一个初学者程序员,所以这可能是一个简单的解决方案。

thanks!谢谢!

If you don't want to use huge libraries like NLTK, you can filter words that starts with 'owl' , not equal to 'owl' :如果您不想使用像 NLTK 这样的大型库,您可以过滤以'owl'开头的单词,而不是'owl'

indices = [i for i, x in enumerate(text) if x.startswith("owl")]

In this case words like 'owlowlowl' will pass too, but one should use NLTK to properly tokenize words like in real world.在这种情况下,像'owlowlowl'这样的词也会通过,但是应该使用 NLTK 来正确标记现实世界中的词。

Python has built in functions for these.These types of matching of strings comes under something called Regular Expressions,which you can go into detail later Python 内置了这些函数。这些类型的字符串匹配属于称为正则表达式的东西,您可以稍后详细介绍 go

a_string = "your string"
substring = "substring that you want to check"

matches = re.finditer(substring, a_string)


matches_positions = [match.start() for match in matches]

print(matches_positions)

finditer() will return an iteration object and start() will return the starting index of the found matches. finditer() 将返回一个迭代 object 并且 start() 将返回找到的匹配项的起始索引。

Simply put,it returns indices of all the substrings in the string简单地说,它返回字符串中所有子字符串的索引

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM