[英]regex to match string with a minimum number of words
I need a regex to match a string only if it contains at least X words (where a word is defined as any continuous non-whitespace sequence).我需要一个正则表达式来匹配至少包含 X 个单词的字符串(其中一个单词被定义为任何连续的非空白序列)。
I am using re.findall()
.我正在使用
re.findall()
。
Hmm, you could use the character class \\S+
to designate a word.嗯,您可以使用字符类
\\S+
来指定一个单词。
\\S
is equivalent to [^\\s]
which is itself equivalent to [^ \\v\\t\\f\\n\\r]
(in order I typed them: white space, vertical tab, horizontal tab, form feed, newline, carriage return). \\S
等价于[^\\s]
,它本身等价于[^ \\v\\t\\f\\n\\r]
(我输入它们的顺序:空格、垂直制表符、水平制表符、换页符、换行符、回车返回)。
[^ ... ]
indicates a negated class, where all characters will be matched, except those inside the class. [^ ... ]
表示否定类,除了类内的字符外,所有字符都将匹配。
Now, for what you're trying to do, I would rather use re.match
like so:现在,对于您想要做的事情,我宁愿像这样使用
re.match
:
re.match(r'\s*\S+(?:\s+\S+){X-1,}', text_to_validate)
(?:\\s+\\S+)
matches space(s) followed by a word. (?:\\s+\\S+)
匹配空格后跟一个单词。
{X-1,}
means that the group (?:\\s+\\S+)
should appear at least X-1
times to match. {X-1,}
表示组(?:\\s+\\S+)
应该至少出现X-1
次才能匹配。 If X=4, then it becomes {3,}
.如果 X=4,则变为
{3,}
。
Alternate, split on spaces and count the number of elements:交替,拆分空格并计算元素数:
re.split(r"\s+", text_to_validate)
import re
subject = """I need a regex to match a string only if it contains at least X words.
Where a word is defined as any continuous non-whitespace sequence.
I am using Python 3 and re.findall()"""
result = re.findall(r"([\S]+)", subject)
if len(result) > 5:
print "yes"
else:
print "no"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.