简体   繁体   English

从字符串中删除括号内的内容

[英]remove contents between brackets from string

I have a string like this: 我有一个像这样的字符串:

s = 'word1 word2 (word3 word4) word5 word6 (word7 word8) word9 word10'

how can I delete everything that is in brackets, so that the output is: 如何删除括号中的所有内容,以便输出为:

'word1 word2 word5 word6 word9 word10'

I tried regular expression but that doesn't seem to work. 我尝试了正则表达式,但这似乎不起作用。 Any suggestions? 有什么建议?

Best Jacques 最好的雅克

import re
s = re.sub(r'\(.*?\)', '', s)

Note that this deletes everything between parentheses only. 请注意,这仅删除括号之间的所有内容。 This means you'll be left with double space between "word2 and word5". 这意味着你将在“word2和word5”之间留下双倍的空间。 Output from my terminal: 我的终端输出:

>>> re.sub(r'\(.*?\)', '', s)
'word1 word2  word5 word6  word9 word10'
>>> # -------^ -----------^ (Note double spaces there)

However, the output you have provided isn't so. 但是,您提供的输出并非如此。 To remove the extra-spaces, you can do something like this: 要删除多余的空格,您可以执行以下操作:

>>> re.sub(r'\(.*?\)\ *', '', s)
'word1 word2 word5 word6 word9 word10'

My solution is better just because it deletes extra space character ;-) 我的解决方案更好,因为它删除了额外的空间字符;-)

re.sub( "\s\(.*?\)","",s)

EDIT : You are write, it does not catch all cases. 编辑 :你是写的,它没有捕获所有的情况。 Of course I can write more complex expression trying to take into account more detail: 当然,我可以编写更复杂的表达式,试图考虑更多细节:

re.sub( "\s*\(.*?\)\s*"," ",s)

Now result is a desired string or " " if the original string is limited by parentheses and spaces. 现在结果是一个所需的字符串或“”,如果原始字符串受括号和空格限制。

您应该用空字符串替换所有出现的此正则表达式: \\([^\\)]*\\)

You could go through it character by character. 你可以逐字逐句地完成它。 If you keep one string that is the result string, one string that is the discard string, and a boolean of whether or not you're deleting right now. 如果你保留一个作为结果字符串的字符串,一个作为discard字符串的字符串,以及你是否正在删除的布尔值。

Then, for each character, if the boolean is true then you add it to the delete string and if it's false then you add it to the real string. 然后,对于每个字符,如果布尔值为true,则将其添加到删除字符串中,如果它为false,则将其添加到实际字符串中。 If it's an open bracket you add it to the delete string and set the boolean to true; 如果它是一个开放式括号,你将它添加到删除字符串并将布尔值设置为true; if it's a close bracket you set the delete string to "" and set the boolean to false. 如果它是一个小括号,你将删除字符串设置为“”并将布尔值设置为false。

Finally, this leaves you at the end with a delete string IF there was a bracket opened but not closed. 最后,如果有一个括号打开但未关闭,最后会留下删除字符串。

If you want to deal with multiple brackets, use an integer count of how many you've opened but not closed, instead of a boolean. 如果要处理多个括号,请使用已打开但尚未关闭的整数计数,而不是布尔值。

If the format of your lines are always like the one you show, you probably could try without regexes: 如果你的行的格式总是像你显示的那样,你可能会尝试没有正则表达式:

>>> s.replace('(','').replace(')','')
'word1 word2 word3 word4 word5 word6 word7 word8 word9 word10'

This is 4 times faster than regular expresions 这比常规表达快4倍

>>> t1 = timeit.Timer("s.replace('(','').replace(')','')", "from __main__ import s")
>>> t2 = timeit.Timer("sub(r'\(.*?\)\ *', '', s)", "from __main__ import s; from re import sub")
>>> t1.repeat()
[0.73440917436073505, 0.6970294320000221, 0.69534249907820822]
>>> t2.repeat()
[2.7884134544113408, 2.7414613750137278, 2.7336896241081377]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM