I am trying to remove all bracketed and parenthetical text. I am using the regex
re.sub(r'\\(.*\\) | \\[.*\\]', '', text)
This works for things like:
import re
text = 'the (quick) brown fox jumps over the [lazy] dog'
print re.sub(r'\(.*\) | \[.*\]', '', text)
> the brown fox jumps over the dog
text = '(the quick) brown fox jumps over the [lazy] dog'
print re.sub(r'\(.*\) | \[.*\]', '', text)
> brown fox jumps over the dog
But it fails when the entire string matches the regex
text = '[the quick brown fox jumps over the lazy dog]'
print re.sub(r'\(.*\) | \[.*\]', '', text)
> [the quick brown fox jumps over the lazy dog]
> # This should be '' (the empty string) #
Where am I going wrong?
you have extra space over the regex, just need to remove the space before and after |
re.sub(r'\(.*\)|\[.*\]', '', text)
or make them an optional match to match your existing output
re.sub(r'\(.*\)\s?|\s?\[.*\]', '', text)
You have an extra space that it is trying to match :)
Try:
re.sub(r'\(.*\)|\[.*\]', '', text)
A good place to test when regex does weird stuff like this is here . It's a nice interactive way to see what's going wrong. For ex. in your case, it didn't match "(pace)" but matched "(pace) " as soon as I put a space after it.
Note:
As I mentioned in the comment, be aware that the greedy match might do unexpected things if you have a random ")" in your text that may just be a standalone symbol. Consider the reluctant matching instead:
re.sub(r'\(.*?\)|\[.*?\]', '', text)
which would turn:
This is a (small) sample text with a ) symbol" ===> "This is a sample text with a ) symbol"
whereas yours currently would give:
This is a (small) sample text with a ) symbol" ===> "This is a symbol"
import re
text = '''[the quick brown fox jumps over the lazy dog]
the (quick) brown fox jumps over the [lazy] dog
(the quick) brown fox jumps over the [lazy] dog'''
print (re.sub(r'[(\[].+?[)\]]', '', text))
out:
the brown fox jumps over the dog
brown fox jumps over the dog
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.