简体   繁体   English

Python正则表达式可搜索句子中的单词

[英]Python regular expression to search for words in a sentence

Im still learning the ropes with Python ad regular expressions and I need some help please! 我仍在学习Python广告正则表达式的知识,请帮忙! I am in need of a regular expression that can search a sentence for specific words. 我需要一个可以在句子中搜索特定单词的正则表达式。 I have managed to create a pattern to search for a single word but how do i retrieve the other words i need to find? 我设法创建了一个模式来搜索单个单词,但是如何检索需要查找的其他单词呢? How would the re pattern look like to do this? 重新模式看起来如何做到这一点?

>>> question = "the total number of staff in 30?"
>>> re_pattern = r'\btotal.*?\b'
>>> m = re.findall(re_pattern, question)
['total']

It must look for the words "total" and "staff" Thanks Mike 它必须查找单词“ total”和“ staff”,谢谢Mike

Use the union operator | 使用联合运算符| to search for all the words you need to find: 搜索您需要查找的所有单词:

In [20]: re_pattern = r'\b(?:total|staff)\b'

In [21]: re.findall(re_pattern, question)
Out[21]: ['total', 'staff']

This matches your example above most closely. 这与上面的示例最接近。 However, this approach only works if there are no other characters which have been prepended or appended to a word. 但是,这种方法仅在没有其他字符被附加或附加到单词之后才有效。 This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause. 在主从句和从句的末尾通常会出现这种情况,其中逗号,点,感叹号或问号会附加在子句的最后一个单词上。

For example, in the question How many people are in your staff? 例如, 在您的职员中有多少人? the approach above wouldn't find the word staff because there is no word boundary at the end of staff . 上面的方法找不到员工一词,因为员工末尾没有单词边界。 Instead, there is a question mark. 而是有一个问号。 But if you leave out the second \\b at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities . 但是,如果你离开了第二\\b在上述正则表达式的结尾,表达会错误地检测单词串,如全部totalities

The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find: 完成所需操作的最佳方法是首先提取句子中的所有字母数字字符,然后在此列表中搜索所需查找的单词:

In [51]: def find_all_words(words, sentence):
....:     all_words = re.findall(r'\w+', sentence)
....:     words_found = []
....:     for word in words:
....:         if word in all_words:
....:             words_found.append(word)
....:     return words_found

In [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')
['total', 'staff'] 

In [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')
['staff']
question = "the total number of staff in 30?"
find=["total","staff"]
words=re.findall("\w+",question)
result=[x for x in find if x in words]
result
['total', 'staff']

Have you though to use something beyond Regex? 您是否使用了Regex以外的其他功能?

Consider this and and if it works expand from this solution 考虑一下,如果可行,可以从此解决方案扩展

>>> 'total' in question.split()
True

Similarly 相似地

>>> words = {'total','staff'}
>>> [e   for e in words if e in question.split()]
['total', 'staff']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM