繁体   English   中英

正则表达式整个字符串匹配数字之间

[英]regex whole string match between numbers

我想从一个句子中提取整个单词。 感谢这个答案

import re

def findWholeWord(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

在以下情况下,我可以得到完整的单词:

findWholeWord('thomas')('this is Thomas again')   # -> <match object>
findWholeWord('thomas')('this is,Thomas again')   # -> <match object>
findWholeWord('thomas')('this is,Thomas, again')  # -> <match object>
findWholeWord('thomas')('this is.Thomas, again')  # -> <match object>
findWholeWord('thomas')('this is ?Thomas again')  # -> <match object>

单词旁边的符号不会打扰。

但是,如果有一个数字,它就找不到这个词。

我应该如何修改表达式以匹配单词旁边有数字的情况? 喜欢:

findWholeWord('thomas')('this is 9Thomas, again')
findWholeWord('thomas')('this is9Thomas again')
findWholeWord('thomas')('this is Thomas36 again')

可以使用正则表达式(?:\d|\b){0}(?:\d|\b)将目标单词与单词边界或两侧的数字进行匹配。

import re

def findWholeWord(w):
    return re.compile(r'(?:\d|\b){0}(?:\d|\b)'.format(w), flags=re.I).search

for s in [
    'this is Thomas again',
    'this is,Thomas again',
    'this is,Thomas, again',
    'this is.Thomas, again',
    'this is ?Thomas again',
    'this is 9Thomas, again',
    'this is9Thomas again',
    'this is Thomas36 again',
    'this is -Thomas- again',
    'athomas is no match',
    'thomason no match']:
    print("match >" if findWholeWord('thomas')(s) else "*no match* >", s)

Output:

match > this is Thomas again
match > this is,Thomas again
match > this is,Thomas, again
match > this is.Thomas, again
match > this is ?Thomas again
match > this is 9Thomas, again
match > this is9Thomas again
match > this is Thomas36 again
match > this is -Thomas- again
*no match* > athomas is no match
*no match* > thomason no match

如果您想针对多个输入或在循环中重用相同的目标词,则可以将findWholeWord()调用分配给一个变量,然后调用它。

matcher = findWholeWord('thomas')
print(matcher('this is Thomas again'))
print(matcher('this is,Thomas again'))

您可以使用以下代码:

import re

def findWholeWord(w):
    return re.compile(r'(?:\d+{0}|{0}\d+|\b{0}\b)'.format(w), flags=re.I).search


print ( findWholeWord('thomas')('this is 9Thomas, again') )
print ( findWholeWord('thomas')('this is9Thomas again') )
print ( findWholeWord('thomas')('this is Thomas36 again') )
print ( findWholeWord('thomas')('this is Thomas again') )
print ( findWholeWord('thomas')('this is,Thomas again') )
print ( findWholeWord('thomas')('this is,Thomas, again') )
print ( findWholeWord('thomas')('this is.Thomas, again') )
print ( findWholeWord('thomas')('this is ?Thomas again') )
print ( findWholeWord('thomas')('this is aThomas again') )

Output:

<re.Match object; span=(8, 15), match='9Thomas'>
<re.Match object; span=(7, 14), match='9Thomas'>
<re.Match object; span=(8, 16), match='Thomas36'>
<re.Match object; span=(8, 14), match='Thomas'>
<re.Match object; span=(8, 14), match='Thomas'>
<re.Match object; span=(8, 14), match='Thomas'>
<re.Match object; span=(8, 14), match='Thomas'>
<re.Match object; span=(9, 15), match='Thomas'>
None

(?:\d+{0}|{0}\d+|\b{0}\b)将匹配给定的单词,其两侧有 1 个以上的数字或完整的单词。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM