简体   繁体   中英

remove contents between brackets from string

I have a string like this:

s = 'word1 word2 (word3 word4) word5 word6 (word7 word8) word9 word10'

how can I delete everything that is in brackets, so that the output is:

'word1 word2 word5 word6 word9 word10'

I tried regular expression but that doesn't seem to work. Any suggestions?

Best Jacques

import re
s = re.sub(r'\(.*?\)', '', s)

Note that this deletes everything between parentheses only. This means you'll be left with double space between "word2 and word5". Output from my terminal:

>>> re.sub(r'\(.*?\)', '', s)
'word1 word2  word5 word6  word9 word10'
>>> # -------^ -----------^ (Note double spaces there)

However, the output you have provided isn't so. To remove the extra-spaces, you can do something like this:

>>> re.sub(r'\(.*?\)\ *', '', s)
'word1 word2 word5 word6 word9 word10'

My solution is better just because it deletes extra space character ;-)

re.sub( "\s\(.*?\)","",s)

EDIT : You are write, it does not catch all cases. Of course I can write more complex expression trying to take into account more detail:

re.sub( "\s*\(.*?\)\s*"," ",s)

Now result is a desired string or " " if the original string is limited by parentheses and spaces.

您应该用空字符串替换所有出现的此正则表达式: \\([^\\)]*\\)

You could go through it character by character. If you keep one string that is the result string, one string that is the discard string, and a boolean of whether or not you're deleting right now.

Then, for each character, if the boolean is true then you add it to the delete string and if it's false then you add it to the real string. If it's an open bracket you add it to the delete string and set the boolean to true; if it's a close bracket you set the delete string to "" and set the boolean to false.

Finally, this leaves you at the end with a delete string IF there was a bracket opened but not closed.

If you want to deal with multiple brackets, use an integer count of how many you've opened but not closed, instead of a boolean.

If the format of your lines are always like the one you show, you probably could try without regexes:

>>> s.replace('(','').replace(')','')
'word1 word2 word3 word4 word5 word6 word7 word8 word9 word10'

This is 4 times faster than regular expresions

>>> t1 = timeit.Timer("s.replace('(','').replace(')','')", "from __main__ import s")
>>> t2 = timeit.Timer("sub(r'\(.*?\)\ *', '', s)", "from __main__ import s; from re import sub")
>>> t1.repeat()
[0.73440917436073505, 0.6970294320000221, 0.69534249907820822]
>>> t2.repeat()
[2.7884134544113408, 2.7414613750137278, 2.7336896241081377]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM