简体   繁体   中英

How to remove text within parentheses from Python string?

I am trying to remove parentheses and the text that resides in these parentheses, as well as hyphen characters. Some string examples look like the following:
example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens

I would like the results to be:

example = 'Year 1.2 Q4.1'  
example2 = 'Year 2-7 Q4.8'  

How can I remove text residing within or following parentheses and special characters? I could only find str.strip() method. I am new to Python, so any feedback is greatly appreciated!

You may use below regex to get the desired result:

"\(.*\)|\s-\s.*"
#   ^     ^  Pattern 2: everything followed by space, '-' hyphen, space
#   ^   Pattern 1: everything within brackets (....)

Sample run:

>>> import re
>>> my_regex = "\(.*\)|\s-\s.*"

>>> example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
>>> example2 = 'Year 2-7 Q4.8 - Data markets and phases'

>>> re.sub(my_regex, "", example)
'Year 1.2 Q4.1'
>>> re.sub(my_regex, "", example2)
'Year 2-7 Q4.8'

Here I am using re.sub(pattern, repl, string, ...) which as the document says:

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl . If the pattern isn't found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.

We can do this using a * and a throwaway variable.

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
display,*_ = example.split('(')
print(display)

example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens
part_1,part_2,*_ = example2.split('-')
display = part_1 + '-'+ part_2
print(display)

You can try something like this , you will need little data cleaning after you fetch result to make it as your desired output:

import re
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('file.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

Here is an example without regex (just to show you have good regex can be):

The code adds strings until a string starts with Q :

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'

def clean_string(s):
    for item in s.split():
        yield item
        if item.startswith('Q'):
            break

print(' '.join(clean_string(example)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM