I need a regex to match a string only if it contains at least X words (where a word is defined as any continuous non-whitespace sequence).
I am using re.findall()
.
Hmm, you could use the character class \\S+
to designate a word.
\\S
is equivalent to [^\\s]
which is itself equivalent to [^ \\v\\t\\f\\n\\r]
(in order I typed them: white space, vertical tab, horizontal tab, form feed, newline, carriage return).
[^ ... ]
indicates a negated class, where all characters will be matched, except those inside the class.
Now, for what you're trying to do, I would rather use re.match
like so:
re.match(r'\s*\S+(?:\s+\S+){X-1,}', text_to_validate)
(?:\\s+\\S+)
matches space(s) followed by a word.
{X-1,}
means that the group (?:\\s+\\S+)
should appear at least X-1
times to match. If X=4, then it becomes {3,}
.
Alternate, split on spaces and count the number of elements:
re.split(r"\s+", text_to_validate)
import re
subject = """I need a regex to match a string only if it contains at least X words.
Where a word is defined as any continuous non-whitespace sequence.
I am using Python 3 and re.findall()"""
result = re.findall(r"([\S]+)", subject)
if len(result) > 5:
print "yes"
else:
print "no"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.