简体   繁体   中英

Parsing out a string that is surrounded by quotes and has the word “… ” in it and print the string Python Regex

I'm trying to use REGEX to parse out a string the first text by quotes and has the word "quoted" in it . Finally I print out the string. The output should be "This is a quoted message"

import re
text = 'I have "message with quotes" in it. "This is a quoted message."'
r = re.search('".*?quoted.*?"', text) #wrong
if r == None:
    print("not found")
else:
    print(text[r.regs[0][0]:r.regs[0][1]]) #print out the string

First, let me say that an online editor like regex101.com could really come in handy here.

Second, here's a working regex string:

'"[^"]*quoted[^"]*"'

Let me explain what's going on:

When you used '".*?quoted.*?"' and got "message with quotes" in it. "This is a quoted message." "message with quotes" in it. "This is a quoted message." , what was happening was the .* also matched quotation marks " . Even though the quantifier was set to lazy, regex is reading from left to right, which means it will start expanding from the left to the right. Now in this case, I've replaced . with [^"] , which just means that all quotation marks will be avoided, so now, [^"]* cannot match with " , and the expected string is produced.

Your regexp is nearly correct. There are two things you need:

  • Use parentheses to capture the bit that you are interested in keeping - this can then be used with the group method. The " characters are not included in the capture group. (I am assuming here that where you use them in the question, it is only to quote the string and not to imply that you want them as part of the output.)

  • Your first .*? will match any amount of any (non-newline) character, and even though it is "lazy" rather than greedy, it will still find the match with the first available starting position, which may cause it to include " characters themselves. It should be replaced by [^"]*? to ensure that these are not matched. (The second .*? can be changed similarly or left as-is; it shouldn't matter, as in that case the lazy quantifier is enough to ensure that it doesn't match any " characters.)

import re
text = 'I have "message with quotes" in it. "This is a quoted message."'
r = re.search('"([^"]*?quoted.*?)"', text)
if r == None:
    print("not found")
else:
    print(r.group(1))

This gives:

This is a quoted message.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM