简体   繁体   中英

How to store string in quotation that contains two words?

I wrote the search code and I want to store what is between " " as one place in the list, how I may do that? In this case, I have 3 lists but the second one should is not as I want.

import re

message='read read read'

others = ' '.join(re.split('\(.*\)', message))
others_split = others.split()

to_compile = re.compile('.*\((.*)\).*')
to_match = to_compile.match(message)
ors_string = to_match.group(1)

should = ors_string.split(' ')

must = [term for term in re.findall(r'\(.*?\)|(-?(?:".*?"|\w+))', message) if term and not term.startswith('-')]

must_not = [term for term in re.findall(r'\(.*?\)|(-?(?:".*?"|\w+))', message) if term and term.startswith('-')]
must_not = [s.replace("-", "") for s in must_not]

print(f'must: {must}')
print(f'should: {should}')
print(f'must_not: {must_not}')

Output:

must: ['read', '"find find"', 'within', '"plane"']
should: ['"exactly', 'needed"', 'empty']
must_not: ['russia', '"destination good"']

Wanted result :

must: ['read', '"find find"', 'within', '"plane"']
should: ['"exactly needed"', 'empty'] <---
must_not: ['russia', '"destination good"']

Error when edited the message, how to handle it?

Traceback (most recent call last):
    ors_string = to_match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Your should list splits on whitespace: should = ors_string.split(' ') , this is why the word is split in the list. The following code gives you the output you requested but I'm not sure that is solves your problem for future inputs.

import re

message = 'read "find find":within("exactly needed" OR empty) "plane" -russia -"destination good"'

others = ' '.join(re.split('\(.*\)', message))
others_split = others.split()

to_compile = re.compile('.*\((.*)\).*')
to_match = to_compile.match(message)
ors_string = to_match.group(1)

# Split on OR instead of whitespace.
should = ors_string.split('OR')
to_remove_or = "OR"
while to_remove_or in should:
    should.remove(to_remove_or)

# Remove trailing whitespace that is left after the split.
should = [word.strip() for word in should]

must = [term for term in re.findall(r'\(.*?\)|(-?(?:".*?"|\w+))', message) if term and not term.startswith('-')]

must_not = [term for term in re.findall(r'\(.*?\)|(-?(?:".*?"|\w+))', message) if term and term.startswith('-')]
must_not = [s.replace("-", "") for s in must_not]

print(f'must: {must}')
print(f'should: {should}')
print(f'must_not: {must_not}')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM