刪除文本文件中包含python中特定單詞/字符串的整行

Question

所有。 我使用SO中的示例並嘗試刪除文本文件中的幾行/字符串但不成功。 例如，需要刪除的字符串行

             OSPF Process 1 with Router ID 1.1.1.1
                       Area: 0.0.0.11
               Link State Database

我能夠通過如下指定整個字符串/行來刪除這些行，但這一次只能刪除一行而另一個問題是路由器ID和區域可以是任何數字並動態更改。

filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()
with open('clean.txt', 'w') as fout:
    for line in lines:
        if 'Area: 0.0.0.10' not in line:
            fout.write(line)

我嘗試使用startwith但它不會刪除它。

if not line.startswith('OSPF'):

這是文本文件中的外觀和字符串放置方式。 OSPF ...，Area ...，Link ...行不從左邊開始，它以空格開頭，所以我認為這就是為什么startswith不起作用的原因。


     OSPF Process 1 with Router ID 1.1.1.1
                 Area: 0.0.0.11
         Link State Database 


some textxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

         OSPF Process 1 with Router ID 2.1.1.1
                 Area: 0.0.0.12
         Link State Database 

some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


     OSPF Process 1 with Router ID 2.2.2.2
                 Area: 0.0.0.33
         Link State Database 

some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

刪除這些行后，預計如下所示

some textxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

請進一步指教，謝謝

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


     OSPF Process 1 with Router ID 1.1.1.1
                 Area: 0.0.0.11
         Link State Database

例如，執行腳本時超過5行。它將刪除3行，但仍然保留2行

另一個例子

 * Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
                         Area: 0.0.0.13
                 Link State Database


  Type      : Router
  Ls id     : 1.4.0.2
  Adv rtr   : 1.4.0.2

這有4行（在Area之前和Type之前）...當執行腳本時只有2行刪除...而2將保留....對於此...最終應該如下

* Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low

  Type      : Router
  Ls id     : 1.4.0.2
  Adv rtr   : 1.4.0.2

刪除特定的字符串和行以及下一行（在鏈接狀態數據庫行之后）

clean.txt

**To remove this empty line
To remove this empty line
To remove this empty line**
  Type      : Router
  Ls id     : 1.4.0.1
  Adv rtr   : 1.4.0.1
  Ls age    : 996
  Len       : 48
  Options   :  ASBR  E
  seq#      : 8000002f
  chksum    : 0xe7f5
  Link count: 2
   * Link ID: 1.16.9.9
     Data   : 10.1.155.2
     Link Type: P-2-P
     Metric : 100
   * Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 100
     Priority : Low

  Type      : Router
  Ls id     : 1.16.9.9
  Adv rtr   : 1.16.9.9
  Ls age    : 392
  Len       : 48
  Options   :  ABR  E
  seq#      : 8000001e
  chksum    : 0x3116
  Link count: 2
   * Link ID: 1.4.0.1
     Data   : 10.242.177.21
     Link Type: P-2-P
     Metric : 1
   * Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
**To remove this empty line**

  Type      : Router
  Ls id     : 1.4.0.2
  Adv rtr   : 1.4.0.2
  Ls age    : 1194
  Len       : 96
  Options   :  ASBR  E
  seq#      : 8001cf7b
  chksum    : 0xbfae
  Link count: 6
   * Link ID: 1.4.0.2
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 0
     Priority : Medium
   * Link ID: 1.4.0.1
     Data   : 10.0.0.2
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.0.0.0
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 10
     Priority : Low
   * Link ID: 10.40.8.0
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 100
     Priority : Low
   * Link ID: 19.23.23.15
     Data   : 10.40.10.130
     Link Type: P-2-P
     Metric : 10
   * Link ID: 1.4.10.200
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 10
     Priority : Low
To remove this empty line

  Type      : Router
  Ls id     : 100.100.0.10
  Adv rtr   : 100.100.0.10
  Ls age    : 171
  Len       : 84
  Options   :  ASBR  E
  seq#      : 8001a292
  chksum    : 0x5fa2
  Link count: 5
   * Link ID: 100.100.0.10
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 12
     Priority : Medium
   * Link ID: 10.10.0.1
     Data   : 10.10.10.18
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.10.10.17
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium
   * Link ID: 19.23.23.15
     Data   : 10.10.30.30
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.90.25.30
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium

  Type      : Router
  Ls id     : 10.10.0.1
  Adv rtr   : 10.10.0.1
  Ls age    : 191
  Len       : 96
  Options   :  ASBR  E
  seq#      : 80013bcf
  chksum    : 0x9871
  Link count: 6
   * Link ID: 10.10.0.1
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 12
     Priority : Medium
   * Link ID: 15.51.51.14
     Data   : 10.10.0.130
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.10.0.129
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium
   * Link ID: 100.100.0.10
     Data   : 10.10.10.17
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.10.10.18
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium
   * Link ID: 16.16.16.0
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 10
     Priority : Low

  Type      : Router
  Ls id     : 15.51.51.14
  Adv rtr   : 15.51.51.14
  Ls age    : 2487
  Len       : 60
  Options   :  ASBR  ABR  E
  seq#      : 8000003c
  chksum    : 0x1714
  Link count: 3
   * Link ID: 10.242.95.12
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
   * Link ID: 10.10.0.1
     Data   : 10.10.0.129
     Link Type: P-2-P
     Metric : 1
   * Link ID: 10.10.0.128
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
**To remove this empty line
To remove this empty line**

Answer 1

請注意，該行不是以OSPF開頭，而是以一堆空格開始， 然后是 OSPF。 嘗試先strip線。 此外， startswith可以采用可能的前綴元組，因此您可以一次性檢查所有內容。

for line in lines:
    if not line.strip().startswith(("OSPF", "Area", "Link State")):
        fout.write(line)

請注意，如果實際文本中的某些行也以Area或類似的方式開頭，則可能會失敗。

您還可以使用正則表達式來確保該行必須以某些空格開頭， 然后使用其中一個關鍵字：

import re
for line in lines:
    if not re.match(r"\s+(Area|OSPF|Link State)", line):
        fout.write(line)

Answer 2

您可以使用正則表達式查找某些特定文本並將其刪除。 下面是示例代碼，您可以根據您的要求使用不同的正則表達式。

嘗試以下代碼：

import re
regex = "OSPF|Area|Link"
for line in lines:
    if not re.findall(regex, line):
        print line

Answer 3

你可以做什么而不是逐行閱讀是讀取文本文件的整個內容，並使用特定匹配的模式考慮可能變化的數字部分。

^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database\s*(?:\n|$)

說明

^字符串的開始
[ \\t]*匹配空格或制表符的0倍以上
OSPF Process \\d+ with Router ID \\d+(?:\\.\\d+){3}匹配文本，將進程和路由器ID的數字\\d+格式考慮在內
\\s*Area: \\d+(?:\\.\\d+){3}匹配Area:后跟1位數字，重復3次點和1位數字
\\s*Link State Database匹配空白字符和文字文本的0+倍
\\s*(?:\\n|$)匹配空白字符串的0+次，然后匹配換行符或斷言字符串的結尾

正則表達式演示 | Python演示

例如：

import re

filename = 'raw.txt'
pattern = r"^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database\s*(?:\n|$)"
with open(filename, 'r') as fin:
    res = re.sub(pattern, "", fin.read(), 0, re.MULTILINE)
    text_file = open("clean.txt", "w")
    text_file.write(res)
    text_file.close()

編輯

要匹配后面的空換行符，可以在數據庫后使用add：

[ \\t]*匹配空格或字符串的0+倍
(?:非捕獲組
- (?:\\r?\\n|\\r)[ \\t]*匹配換行符，然后匹配標簽或空格的0+次
)? 關閉非捕獲組並使其可選
$斷言字符串的結尾

完整模式：

^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database[ \t]*(?:(?:\r?\n|\r)[ \t]*)?$

正則表達式演示

刪除文本文件中包含python中特定單詞/字符串的整行

問題描述

3 個解決方案

解決方案1
1 2019-05-17 08:17:09

解決方案2
1 2019-05-17 08:17:52

解決方案3
1 2019-05-17 09:36:50

刪除文本文件中包含python中特定單詞/字符串的整行

問題描述

3 個解決方案

解決方案1 1 2019-05-17 08:17:09

解決方案2 1 2019-05-17 08:17:52

解決方案3 1 2019-05-17 09:36:50

解決方案1
1 2019-05-17 08:17:09

解決方案2
1 2019-05-17 08:17:52

解決方案3
1 2019-05-17 09:36:50