简体   繁体   中英

Python regular expression. Find a sentence in a sentence

I'm trying to find an expression "K others" in a sentence "Chris and 34K others"

I tried with regular expression, but it doesn't work :(

import re


value = "Chris and 34K others"

m = re.search("(.K.others.)", value)

if m:
    print "it is true"
else:
    print "it is not"

Guessing that you're web-page scraping " you and 34k others liked this on Facebook ", and you're wrapping "K others" in a capture group, I'll jump straight to how to get the number:

import re

value = "Chris and 34K others blah blah"

# regex describes
# a leading space, one or more characters (to catch punctuation)
# , and optional space, trailing 'K others' in any capitalisation
m = re.search("\s(\w+?)\s*K others", value, re.IGNORECASE)

if m:
    captured_values = m.groups()
    print "Number of others:", captured_values[0], "K"
else:
    print "it is not"

Try this code on repl.it

This should also cover uppercase/lowercase K, numbers with commas (1,100K people), spaces between the number and the K, and work if there's text after 'others' or if there isn't.

You should use search rather than match unless you expect your regular expression to match at the beginning. The help string for re.match mentions that the pattern is applied at the start of the string.

If you want to match something within the string, use re.search . re.match starts at the beginning, Also, change your RegEx to: (K.others) , the last . ruins the RegEx as there is nothing after, and the first . matches any character before. I removed those:

>>> bool(re.search("(K.others)", "Chris and 34K others"))
True

The RegEx (K.others) matches:

Chris and 34K others
            ^^^^^^^^

Opposed to (.K.others.) , which matches nothing. You can use (.K.others) as well, which matches the character before:

Chris and 34K others
           ^^^^^^^^^      

Also, you can use \\s to escape space and match only whitespace characters: (K\\sothers) . This will literally match K, a whitespace character, and others.

Now, if you want to match all preceding and all following, try: (.+)?(K\\sothers)(\\s.+)? . Here's a link to repl.it . You can get the number with this .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM