正则表达式查找以特定字母开头或结尾的单词

Question

Write a function called getWords(sentence, letter) that takes in a sentence and a single letter, and returns a list of the words that start or end with this letter, but not both, regardless of the letter case. 编写一个名为getWords(sentence, letter)的函数getWords(sentence, letter)该函数接受一个句子和一个字母，并返回以该字母开头或结尾的单词的列表，但不管字母大小写如何，都不能返回两个单词。

For example: 例如：

>>> s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> getWords(s, "t")
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

My attempt: 我的尝试：

regex = (r'[\w]*'+letter+r'[\w]*')
return (re.findall(regex,sentence,re.I))

My Output: 我的输出：

['The', 'TART', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'until', 'next']

Answer 1

\\b detects word breaks. \\b检测到断字。 Verbose mode allows multi-line regexs and comments. 详细模式允许多行正则表达式和注释。 Note that [^\\W] is the same as \\w , but to match \\w except a certain letter, you need [^\\W{letter}] . 请注意， [^\\W]与\\w相同，但是要匹配\\w除了某个字母之外，您需要[^\\W{letter}] 。

import re

def getWords(s,t):
    pattern = r'''(?ix)           # ignore case, verbose mode
                  \b{letter}      # start with letter
                  \w*             # zero or more additional word characters
                  [^{letter}\W]\b # ends with a word character that isn't letter
                  |               #    OR
                  \b[^{letter}\W] # does not start with a non-word character or letter
                  \w*             # zero or more additional word characters
                  {letter}\b      # ends with letter
                  '''.format(letter=t)
    return re.findall(pattern,s)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(s,'t'))

Output: 输出：

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

Answer 2

Doing this is much easy with the startswith() and endswith() method. 使用startswith()和endswith()方法很容易做到这一点。

def getWords(s, letter):
    return ([word for word in mystring.split() if (word.lower().startswith('t') or 
                word.lower().endswith('t')) and not 
                    (word.lower().startswith('t') and word.lower().endswith('t'))])

mystring = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(mystring, 't'))

Output 输出量

['The', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']

Update (using regular expression) 更新（使用正则表达式）

import re
result1 = re.findall(r'\b[t]\w+|\w+[t]\b', mystring, re.I)
result2 = re.findall(r'\b[t]\w+[t]\b', mystring, re.I)
print([x for x in result1 if x not in result2])

Explanation 说明

Regular expression \\b[t]\\w+ and \\w+[t]\\b finds words that start and ends with letter t and \\b[t]\\w+[t]\\b finds words that both starts and ends with letter t . 正则表达式\\b[t]\\w+和\\w+[t]\\b查找以字母t开头和结尾的单词，而\\b[t]\\w+[t]\\b查找以字母t开头和结尾的单词。

After generating two lists of words, just take the intersection of those two lists. 生成两个单词列表后，只需取这两个列表的交集即可。

Answer 3

Why are you using regex for this? 为什么要为此使用正则表达式？ Just check the first and last character. 只需检查第一个和最后一个字符。

def getWords(s, letter):
    words = s.split()
    return [a for a,b in ((word, set(word.lower()[::len(word)-1])) for word in words) if letter in b and len(b)==2]

Answer 4

It you want the regex for this, then use: 如果要使用正则表达式，则使用：

regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)

The replace is done to avoid the repeated verbose +letter+ . 进行replace是为了避免重复的冗长+letter+ 。

So the code looks like this then: 因此，代码如下所示：

import re

def getWords(sentence, letter):
    regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
    return re.findall(regex, sentence, re.I)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
result = getWords(s, "t")
print(result)

Output: 输出：

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

Explanation 说明

I have used # as a placeholder for the actual letter, and that will get replaced in the regular expression before it is actually used. 我已经将#用作实际字母的占位符，并且在实际使用前将其替换为正则表达式。

\\b : word break \\b ：断字
\\w* : 0 or more letters (or underscores) \\w* ：0个或多个字母（或下划线）
[^#\\W] : a letter that is not # (the given letter) [^#\\W] ：不是#的字母（给定字母）
| : logical OR. ：逻辑或。 The left side matches words that start with the letter, but don't end with it, and the right side matches the opposite case. 左侧匹配以字母开头但不以字母结尾的单词，右侧匹配相反的大小写。

Answer 5

You can try the builtin startswith and endswith functions. 您可以尝试内置的startswith和endswith函数。

>>> string = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> [i for i in string.split() if i.lower().startswith('t') or i.lower().endswith('t')]
['The', 'TART', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']

正则表达式查找以特定字母开头或结尾的单词

问题描述

5 个解决方案

解决方案1
4 2017-02-19 22:41:40

解决方案2
1 2017-02-19 22:39:09

解决方案3
1 2017-02-19 22:39:16

解决方案4
1 已采纳 2017-02-19 22:44:15

Explanation 说明

解决方案5
0 2017-02-19 22:39:42

正则表达式查找以特定字母开头或结尾的单词

问题描述

5 个解决方案

解决方案1 4 2017-02-19 22:41:40

解决方案2 1 2017-02-19 22:39:09

解决方案3 1 2017-02-19 22:39:16

解决方案4 1 已采纳 2017-02-19 22:44:15

Explanation 说明

解决方案5 0 2017-02-19 22:39:42

解决方案1
4 2017-02-19 22:41:40

解决方案2
1 2017-02-19 22:39:09

解决方案3
1 2017-02-19 22:39:16

解决方案4
1 已采纳 2017-02-19 22:44:15

解决方案5
0 2017-02-19 22:39:42