[英]How can I remove texts within parentheses with a regex in python?
I refer the stack overflow 我指的是堆栈溢出
but it is not working. 但它不起作用。
how I solve my problem? 我该如何解决我的问题?
def clean_text(text):
pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '(http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '([ㄱ-ㅎㅏ-ㅣ]+)'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '<[^>]*>'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '[^\w\s]'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '\([^)]*\)' ## not working!!
text = re.sub(pattern=pattern, repl='', string=text)
return text
text = '(abc_def) 좋은글! (이것도 지워조) http://1234.com 감사합니다. aaa@goggle.comㅋㅋ<H1>thank you</H1>'
clean_text(text)
The result is abc_def 좋은글 이것도 지워조 감사합니다 thank you 结果是abc_def thank이것도지워조감사합니다谢谢
My goal is 좋은글 감사합니다 thank you 我的目标是좋은글谢谢
Try this: 尝试这个:
def clean_text(text):
pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '(http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '([ㄱ-ㅎㅏ-ㅣ]+)'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '<[^>]*>'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '\([^)]*\)\s' ## not working!!
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '[^\w\s+]'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = '\s{2,}'
text = re.sub(pattern=pattern, repl=' ', string=text)
return text
The result will be exact 좋은글 감사합니다 thank you 结果将是准确的thank사합니다谢谢
Your [^\\w\\s]
re.sub removes the parentheses and thus the last regex does not match. 您的
[^\\w\\s]
re.sub删除了括号,因此最后一个正则表达式不匹配。 You may swap the last two re.subs and use 您可以交换最后两个re.subs并使用
import re
def clean_text(text):
pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = r'(?:http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = r'[ㄱ-ㅎㅏ-ㅣ]+'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = r'<[^>]*>'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = r'\s*\([^)]*\)'
text = re.sub(pattern=pattern, repl='', string=text)
pattern = r'[^\w\s]'
text = re.sub(pattern=pattern, repl='', string=text)
return text.strip()
text = '(abc_def) 좋은글! (이것도 지워조) http://1234.com 감사합니다. aaa@goggle.comㅋㅋ<H1>thank you</H1>'
print(clean_text(text))
See the online Python demo . 请参阅在线Python演示 。
I suggest using raw string literals (note the r''
prefixes) and stripping the unnecessary spaces with text.strip()
. 我建议使用原始字符串文字(请注意
r''
前缀),并使用text.strip()
去除不必要的空格。 The \\s*
in r'\\s*\\([^)]*\\)'
will remove 0 or more whitespaces before parentheses. 所述
\\s*
在r'\\s*\\([^)]*\\)'
将括号之前删除0以上的空格。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.