简体   繁体   English

Python 正则表达式删除括号中的注释或数字

[英]Python Regex remove comments or numbers in brackets

I am trying to remove line numbers and comments using regex, but it does not work just yet:我正在尝试使用正则表达式删除行号和注释,但它还不起作用:

import re
string = """(1) At what time.!? [asdf] School-
(2) bus. So late, already.!? [ghjk]"""

#res = re.sub(r"[\(\[].*?[\)\]]", "", string)

res = re.sub("(\d+) ","", res)
res = re.sub("[.*]","", res)
res = re.sub(r"-\s","", res)
res = re.sub(r"[^\w\säüöß]","", res)
res = re.sub("-\n","", res)
print(res.split())

So I was trying to remove anything in brackets () and [] with my #commented line, but then I am stuck with a whitespace starting of each line.所以我试图用我的#commented 行删除括号 () 和 [] 中的任何内容,但后来我被每行开头的空格卡住了。 Then I decided to split it up and came up the the five re.sub methods.然后我决定将其拆分并提出五种 re.sub 方法。

Result should be like this:结果应该是这样的:

['At', 'what', 'time', 'Schoolbus', 'So', 'late', 'already']

I am stuck with the Linenumbers not being removed, although they are in () and should be gone.我坚持没有被删除的行号,尽管它们在 () 中并且应该消失了。 Which then causes my res.sub() for connecting words with "-" from school- bus to schoolbus to not work aswell.然后导致我的 res.sub() 用于将单词与从校车到校车的“-”连接起来也不起作用。

You may use this sub + findall solution:您可以使用这个sub + findall解决方案:

import re

string = """(1) At what time.!? [asdf] School-
(2) bus. So late, already.!? [ghjk]"""

print (re.findall(r'\b\w+(?:-\w+)*', re.sub(r'(\([^)]*\)|\[[^]]*\]|-)\s*', '', string)))

Output: Output:

['At', 'what', 'time', 'Schoolbus', 'So', 'late', 'already']

Details:细节:

  • re.sub(r'(\([^)]*\)|\[[^]]*\]|-)\s*', '', string) : Removes all (...) and [...] or - strings followed by 0 or more spaces re.sub(r'(\([^)]*\)|\[[^]]*\]|-)\s*', '', string) :删除所有(...)[...]-后跟 0 个或多个空格的字符串
  • \b\w+ : Matches 1+ word characters starting with a word boundary \b\w+ :匹配以单词边界开头的 1+ 个单词字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM