[英]How to build regex for finding words that start with `\n` and letter and end with digit OR word?
[英]Regex to find words that start or end with a particular letter
编写一个名为getWords(sentence, letter)
的函数getWords(sentence, letter)
该函数接受一个句子和一个字母,并返回以该字母开头或结尾的单词的列表,但不管字母大小写如何,都不能返回两个单词。
例如:
>>> s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> getWords(s, "t")
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
我的尝试:
regex = (r'[\w]*'+letter+r'[\w]*')
return (re.findall(regex,sentence,re.I))
我的输出:
['The', 'TART', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'until', 'next']
\\b
检测到断字。 详细模式允许多行正则表达式和注释。 请注意, [^\\W]
与\\w
相同,但是要匹配\\w
除了某个字母之外,您需要[^\\W{letter}]
。
import re
def getWords(s,t):
pattern = r'''(?ix) # ignore case, verbose mode
\b{letter} # start with letter
\w* # zero or more additional word characters
[^{letter}\W]\b # ends with a word character that isn't letter
| # OR
\b[^{letter}\W] # does not start with a non-word character or letter
\w* # zero or more additional word characters
{letter}\b # ends with letter
'''.format(letter=t)
return re.findall(pattern,s)
s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(s,'t'))
输出:
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
使用startswith()
和endswith()
方法很容易做到这一点。
def getWords(s, letter):
return ([word for word in mystring.split() if (word.lower().startswith('t') or
word.lower().endswith('t')) and not
(word.lower().startswith('t') and word.lower().endswith('t'))])
mystring = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(mystring, 't'))
输出量
['The', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']
更新(使用正则表达式)
import re
result1 = re.findall(r'\b[t]\w+|\w+[t]\b', mystring, re.I)
result2 = re.findall(r'\b[t]\w+[t]\b', mystring, re.I)
print([x for x in result1 if x not in result2])
说明
正则表达式\\b[t]\\w+
和\\w+[t]\\b
查找以字母t
开头和结尾的单词,而\\b[t]\\w+[t]\\b
查找以字母t
开头和结尾的单词。
生成两个单词列表后,只需取这两个列表的交集即可。
为什么要为此使用正则表达式? 只需检查第一个和最后一个字符。
def getWords(s, letter):
words = s.split()
return [a for a,b in ((word, set(word.lower()[::len(word)-1])) for word in words) if letter in b and len(b)==2]
如果要使用正则表达式,则使用:
regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
进行replace
是为了避免重复的冗长+letter+
。
因此,代码如下所示:
import re
def getWords(sentence, letter):
regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
return re.findall(regex, sentence, re.I)
s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
result = getWords(s, "t")
print(result)
输出:
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
我已经将#
用作实际字母的占位符,并且在实际使用前将其替换为正则表达式。
\\b
:断字 \\w*
:0个或多个字母(或下划线) [^#\\W]
:不是#
的字母(给定字母) |
:逻辑或。 左侧匹配以字母开头但不以字母结尾的单词,右侧匹配相反的大小写。 您可以尝试内置的startswith
和endswith
函数。
>>> string = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> [i for i in string.split() if i.lower().startswith('t') or i.lower().endswith('t')]
['The', 'TART', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.