简体   繁体   中英

regex to match string with a minimum number of words

I need a regex to match a string only if it contains at least X words (where a word is defined as any continuous non-whitespace sequence).

I am using re.findall() .

Hmm, you could use the character class \\S+ to designate a word.

\\S is equivalent to [^\\s] which is itself equivalent to [^ \\v\\t\\f\\n\\r] (in order I typed them: white space, vertical tab, horizontal tab, form feed, newline, carriage return).

[^ ... ] indicates a negated class, where all characters will be matched, except those inside the class.

Now, for what you're trying to do, I would rather use re.match like so:

re.match(r'\s*\S+(?:\s+\S+){X-1,}', text_to_validate)

(?:\\s+\\S+) matches space(s) followed by a word.

{X-1,} means that the group (?:\\s+\\S+) should appear at least X-1 times to match. If X=4, then it becomes {3,} .

ideone demo


Alternate, split on spaces and count the number of elements:

re.split(r"\s+", text_to_validate)

ideone demo

import re

subject = """I need a regex to match a string only if it contains at least X words.
Where a word is defined as any continuous non-whitespace sequence.
I am using Python 3 and re.findall()"""

result = re.findall(r"([\S]+)", subject)

if len(result) > 5:
    print "yes"
else:
    print "no"

http://labs.codecademy.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM