[英]How to remove strings between two characters using regular expression python
I am trying to clean up some log and want to extract general information from the message. 我正在尝试清理一些日志,并希望从消息中提取常规信息。 I am newie to python and just learn regular expression yesterday and now have problems. 我是python的新手,昨天刚学习了正则表达式,现在遇到了问题。
My message look like this: 我的讯息如下:
Report ZSIM_RANDOM_DURATION_ started
Report ZSIM_SYSTEM_ACTIVITY started
Report /BDL/TASK_SCHEDULER started
Report ZSIM_JOB_CREATE started
Report RSBTCRTE started
Report SAPMSSY started
Report RSRZLLG_ACTUAL started
Report RSRZLLG started
Report RGWMON_SEND_NILIST started
I try to some code: 我尝试一些代码:
clean_special2=re.sub(r'^[Report] [^1-9] [started]','',text)
but I think this code will remove all rows however I want to keep the format like Report .....Started. 但是我认为这段代码将删除所有行,但是我想保留Report ..... Started这样的格式。 So I only want to remove the jobs name in the middle. 因此,我只想删除中间的作业名称。
I expect my outcome looks like this: 我希望我的结果如下所示:
Report started
Anyone can help me with a idea? 任何人都可以帮助我提出一个想法? Thank you very much! 非常感谢你!
Try something like this: 尝试这样的事情:
clean_special2=re.sub(r'(?<=^Report\b).*(?=\bstarted)',' ',text)
Explanation: the (?<=...)
is a positive lookbehind, eg the string must match the content of this group, but it will not be captured and thus not replaced. 说明: (?<=...)
是正向后方,例如,字符串必须与该组的内容匹配,但不会被捕获,因此不会被替换。 Same thing on the other side with a positive look-ahead (?=...)
. 另一面也有相同的事物,具有积极的前瞻性(?=...)
。 The \\b
is a word boundary, so that everything between these words will be matched. \\b
是一个单词边界,因此这些单词之间的所有内容都将匹配。 Since this will also trim away the whitespace, the replacement is a single whitespace. 由于这也会修剪空白,因此替换为单个空白。
I don't know about the python syntax but I can sure this regexp can help you match your string 我不了解python语法,但是我可以确定此正则表达式可以帮助您匹配字符串
/^Report\\W+([\\w&.#@%^!~ -]+)\\W+started/m* /^Report\\W+([\\w&.#@%^!~- ] +)\\ W + started / m *
The python string might be like this python字符串可能是这样的
text = "Report ZSIM_RANDOM_DURATION_ started"; text =“开始报告ZSIM_RANDOM_DURATION_”;
clean_special2=re.sub(r'^Report\\W+([\\w&.#@%^!~ -]+)\\W+started',' ',text)* clean_special2 = re.sub(r'^ Report \\ W +([\\ w&.#@%^!~- ] +)\\ W + started','',text)*
This should do... '^Report\\ [^\\ ]*\\ started' 这应该执行...'^ Report \\ [^ \\] * \\开始'
Regex is black magic, only use it when you have to. 正则表达式是黑魔法,仅在必要时使用它。 Online tools make it much easier to write: https://regex101.com/ 在线工具使其更容易编写: https : //regex101.com/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.