从文本文件中删除特定行

Question

I have a huge log file containing a bunch of lines like: 我有一个巨大的日志文件，其中包含很多行，例如：

...
Useful stuff
...
Finished 0 of 435
Finished 1 of 435
...
Finished 435 of 435
...
Other useful stuff

How to elegantly remove all the "Finished n of N" lines except "Finished N of N"? 如何优雅地删除“ N的成品N”行之外的所有“ N的成品N”行？

This shall be done on Windows, with eg Python or GNU tools. 这应该在Windows上使用Python或GNU工具完成。

Answer 1

You can use awk : 您可以使用awk ：

awk '/^Finished/ && $2!=$4 {next}1' logfile
...
Useful stuff
...
...
Finished 435 of 435
...
Other useful stuff

Note: For windows you might have to use double quotes instead of single quotes. 注意：对于Windows，您可能必须使用双引号而不是单引号。

Answer 2

You can try with empty string substitution 您可以尝试使用空字符串替换

^Finished (\d+) of (?!\1)\d+$

Here is DEMO 这是演示

在此处输入图片说明

Debuggex Demo Debuggex演示

sample code: 样例代码：

import re
p = re.compile(ur'^Finished (\d+) of (?!\1)\d+$', re.MULTILINE | re.IGNORECASE)
test_str = u"..."
subst = u""

result = re.sub(p, subst, test_str)

Pattern explanation: 模式说明：

  ^                        the beginning of the string
  Finished                 'Finished '
  (                        group and capture to \1:
    \d+                      digits (0-9) (1 or more times)
  )                        end of \1
   of                      ' of '
  (?!                      look ahead to see if there is not:
    \1                       what was matched by capture \1
  )                        end of look-ahead
  \d+                      digits (0-9) (1 or more times)
  $                        the end of the string

EDIT 编辑

One slight change in the regex pattern as per the comment below 正则表达式模式略有变化，如下所示

^Finished (\d+) of (?!\1$)\d+$

DEMO DEMO

从文本文件中删除特定行

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-08-05 19:57:35

解决方案2
2 2014-08-05 19:57:42

EDIT 编辑

从文本文件中删除特定行

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-08-05 19:57:35

解决方案2 2 2014-08-05 19:57:42

EDIT 编辑

解决方案1
2 已采纳 2014-08-05 19:57:35

解决方案2
2 2014-08-05 19:57:42