简体   繁体   中英

How to find substring in a targeted string more accurately in python?

I know 'in' can find substring in another string just like this. [ How to determine whether a substring is in a different string

But I could not how to find exactly substring in the below example:

text = '"Peter,just say hello world." Mary said "En..."' 

I want to judge whether 'Peter' is in text but not in "XXXX" content. If I use

if 'Peter' in text: 
    print 'yes' 
else: 
    print 'no'

But the result returns 'yes', which is wrong because 'Peter' is in "XXXXX".

Besides solving this problem, I want to get the left "XXXX" content. For example, 'Mary' is in text and not in "XXXX" content. I also want to get "Peter,just say hello world.".

To meet your own special requirements, I think it's a good way to process text letter by letter, it's a good way to train your skills in processing string. To this problem, you can use stack to store double quotation, so that you can judge whether a letter is in double quotation.

Like many string processing problems, regular expressions are your friend. One way to handle this problem is to start at the front of the string and incrementally process it.

Check the start of the string to see whether it's unquoted or quoted text. If it's unquoted, pull all the unquoted text off until you hit a quote. If it's quoted text, pull off everything until you hit an end quote. Keep processing the text until all the text has been processed and categorized as either quoted or unquoted.

You'll then have two separate lists of quoted and unquoted text strings. You can then do string inclusion checks in either list.

text = '"Peter,just say hello world." Mary said "En..."' 

unquoted_text = []
quoted_text = []

while text:
    # Pull unquoted text off the front
    m = re.match(r'^([^"]+)(.*)$', text)
    if m:
        unquoted_text.append(m.group(1))
        text = m.group(2)

    # Pull quoted text off the front
    m = re.match(r'^"([^"]*)"(.*)$', text)
    if m:
        quoted_text.append(m.group(1))
        text = m.group(2)

    # Just in case there is a single unmatched double quote (bad!)
    # Categorize as unquoted
    m = re.match(r'^"([^"]*)$', text)
    if m:
        unquoted_text.append(m.group(1))
        text = ''

print 'UNQUOTED'
print unquoted_text

print 'QUOTED'
print quoted_text

is_peter_in_quotes = any(['Peter' in t for t in quoted_text])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM