Python Regex remove comments or numbers in brackets

Question

I am trying to remove line numbers and comments using regex, but it does not work just yet:

import re
string = """(1) At what time.!? [asdf] School-
(2) bus. So late, already.!? [ghjk]"""

#res = re.sub(r"[\(\[].*?[\)\]]", "", string)

res = re.sub("(\d+) ","", res)
res = re.sub("[.*]","", res)
res = re.sub(r"-\s","", res)
res = re.sub(r"[^\w\säüöß]","", res)
res = re.sub("-\n","", res)
print(res.split())

So I was trying to remove anything in brackets () and [] with my #commented line, but then I am stuck with a whitespace starting of each line. Then I decided to split it up and came up the the five re.sub methods.

Result should be like this:

['At', 'what', 'time', 'Schoolbus', 'So', 'late', 'already']

I am stuck with the Linenumbers not being removed, although they are in () and should be gone. Which then causes my res.sub() for connecting words with "-" from school- bus to schoolbus to not work aswell.

Answer 1

You may use this sub + findall solution:

import re

string = """(1) At what time.!? [asdf] School-
(2) bus. So late, already.!? [ghjk]"""

print (re.findall(r'\b\w+(?:-\w+)*', re.sub(r'(\([^)]*\)|\[[^]]*\]|-)\s*', '', string)))

Output:

['At', 'what', 'time', 'Schoolbus', 'So', 'late', 'already']

Details:

re.sub(r'(\([^)]*\)|\[[^]]*\]|-)\s*', '', string) : Removes all (...) and [...] or - strings followed by 0 or more spaces
\b\w+ : Matches 1+ word characters starting with a word boundary

Python Regex remove comments or numbers in brackets

Question

1 answers

solution1
2 ACCPTED 2021-04-15 18:21:47

Python Regex remove comments or numbers in brackets

Question

1 answers

solution1 2 ACCPTED 2021-04-15 18:21:47

solution1
2 ACCPTED 2021-04-15 18:21:47