简体   繁体   English

正则表达式匹配具有最少单词数的字符串

[英]regex to match string with a minimum number of words

I need a regex to match a string only if it contains at least X words (where a word is defined as any continuous non-whitespace sequence).我需要一个正则表达式来匹配至少包含 X 个单词的字符串(其中一个单词被定义为任何连续的非空白序列)。

I am using re.findall() .我正在使用re.findall()

Hmm, you could use the character class \\S+ to designate a word.嗯,您可以使用字符类\\S+来指定一个单词。

\\S is equivalent to [^\\s] which is itself equivalent to [^ \\v\\t\\f\\n\\r] (in order I typed them: white space, vertical tab, horizontal tab, form feed, newline, carriage return). \\S等价于[^\\s] ,它本身等价于[^ \\v\\t\\f\\n\\r] (我输入它们的顺序:空格、垂直制表符、水平制表符、换页符、换行符、回车返回)。

[^ ... ] indicates a negated class, where all characters will be matched, except those inside the class. [^ ... ]表示否定类,除了类内的字符外,所有字符都将匹配。

Now, for what you're trying to do, I would rather use re.match like so:现在,对于您想要做的事情,我宁愿像这样使用re.match

re.match(r'\s*\S+(?:\s+\S+){X-1,}', text_to_validate)

(?:\\s+\\S+) matches space(s) followed by a word. (?:\\s+\\S+)匹配空格后跟一个单词。

{X-1,} means that the group (?:\\s+\\S+) should appear at least X-1 times to match. {X-1,}表示组(?:\\s+\\S+)应该至少出现X-1次才能匹配。 If X=4, then it becomes {3,} .如果 X=4,则变为{3,}

ideone demo ideone 演示


Alternate, split on spaces and count the number of elements:交替,拆分空格并计算元素数:

re.split(r"\s+", text_to_validate)

ideone demo ideone 演示

import re

subject = """I need a regex to match a string only if it contains at least X words.
Where a word is defined as any continuous non-whitespace sequence.
I am using Python 3 and re.findall()"""

result = re.findall(r"([\S]+)", subject)

if len(result) > 5:
    print "yes"
else:
    print "no"

http://labs.codecademy.com/ http://labs.codecademy.com/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM