[英]Remove specific lines from text file
I have a huge log file containing a bunch of lines like: 我有一个巨大的日志文件,其中包含很多行,例如:
...
Useful stuff
...
Finished 0 of 435
Finished 1 of 435
...
Finished 435 of 435
...
Other useful stuff
How to elegantly remove all the "Finished n of N" lines except "Finished N of N"? 如何优雅地删除“ N的成品N”行之外的所有“ N的成品N”行?
This shall be done on Windows, with eg Python or GNU tools. 这应该在Windows上使用Python或GNU工具完成。
You can use awk
: 您可以使用
awk
:
awk '/^Finished/ && $2!=$4 {next}1' logfile
...
Useful stuff
...
...
Finished 435 of 435
...
Other useful stuff
Note: For windows you might have to use double quotes instead of single quotes. 注意:对于Windows,您可能必须使用双引号而不是单引号。
You can try with empty string substitution 您可以尝试使用空字符串替换
^Finished (\d+) of (?!\1)\d+$
sample code: 样例代码:
import re
p = re.compile(ur'^Finished (\d+) of (?!\1)\d+$', re.MULTILINE | re.IGNORECASE)
test_str = u"..."
subst = u""
result = re.sub(p, subst, test_str)
Pattern explanation: 模式说明:
^ the beginning of the string
Finished 'Finished '
( group and capture to \1:
\d+ digits (0-9) (1 or more times)
) end of \1
of ' of '
(?! look ahead to see if there is not:
\1 what was matched by capture \1
) end of look-ahead
\d+ digits (0-9) (1 or more times)
$ the end of the string
One slight change in the regex pattern as per the comment below 正则表达式模式略有变化,如下所示
^Finished (\d+) of (?!\1$)\d+$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.