简体   繁体   中英

Remove entire line in text file that contain with specific word/string in python

all. Im using example from SO and tried to remove few lines/string in the text file but not successful. The string line that need to be deleted for example

             OSPF Process 1 with Router ID 1.1.1.1
                       Area: 0.0.0.11
               Link State Database  

I able to delete those line by specifying exactly the whole string/line as below but this can only remove one line at a time and another problem is Router ID and Area could be any number and change dynamically.

filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()
with open('clean.txt', 'w') as fout:
    for line in lines:
        if 'Area: 0.0.0.10' not in line:
            fout.write(line)

I tried using startwith but it doesn't remove it.

if not line.startswith('OSPF'):

This is how the looks and string placement in the text file. OSPF..., Area..., Link... lines does not start from left, it start with white space, so i think this is why startswith does not work.


     OSPF Process 1 with Router ID 1.1.1.1
                 Area: 0.0.0.11
         Link State Database 


some textxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

         OSPF Process 1 with Router ID 2.1.1.1
                 Area: 0.0.0.12
         Link State Database 

some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


     OSPF Process 1 with Router ID 2.2.2.2
                 Area: 0.0.0.33
         Link State Database 

some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Expected like below after remove those lines

some textxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Please advise further and thank you

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


     OSPF Process 1 with Router ID 1.1.1.1
                 Area: 0.0.0.11
         Link State Database 


for example above 5 lines when executed the script..it will remove 3 lines but still remain 2 lines

another example

 * Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
                         Area: 0.0.0.13
                 Link State Database


  Type      : Router
  Ls id     : 1.4.0.2
  Adv rtr   : 1.4.0.2

this have 4 lines (between Area and before Type)...when executing the script only 2 line remove...and 2 will remain.... For this... the final should be like below

* Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low

  Type      : Router
  Ls id     : 1.4.0.2
  Adv rtr   : 1.4.0.2

Remove specific string and line and also its next line (after Link State Database line)

clean.txt

**To remove this empty line
To remove this empty line
To remove this empty line**
  Type      : Router
  Ls id     : 1.4.0.1
  Adv rtr   : 1.4.0.1
  Ls age    : 996
  Len       : 48
  Options   :  ASBR  E
  seq#      : 8000002f
  chksum    : 0xe7f5
  Link count: 2
   * Link ID: 1.16.9.9
     Data   : 10.1.155.2
     Link Type: P-2-P
     Metric : 100
   * Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 100
     Priority : Low

  Type      : Router
  Ls id     : 1.16.9.9
  Adv rtr   : 1.16.9.9
  Ls age    : 392
  Len       : 48
  Options   :  ABR  E
  seq#      : 8000001e
  chksum    : 0x3116
  Link count: 2
   * Link ID: 1.4.0.1
     Data   : 10.242.177.21
     Link Type: P-2-P
     Metric : 1
   * Link ID: 10.1.155.20
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
**To remove this empty line**

  Type      : Router
  Ls id     : 1.4.0.2
  Adv rtr   : 1.4.0.2
  Ls age    : 1194
  Len       : 96
  Options   :  ASBR  E
  seq#      : 8001cf7b
  chksum    : 0xbfae
  Link count: 6
   * Link ID: 1.4.0.2
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 0
     Priority : Medium
   * Link ID: 1.4.0.1
     Data   : 10.0.0.2
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.0.0.0
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 10
     Priority : Low
   * Link ID: 10.40.8.0
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 100
     Priority : Low
   * Link ID: 19.23.23.15
     Data   : 10.40.10.130
     Link Type: P-2-P
     Metric : 10
   * Link ID: 1.4.10.200
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 10
     Priority : Low
To remove this empty line

  Type      : Router
  Ls id     : 100.100.0.10
  Adv rtr   : 100.100.0.10
  Ls age    : 171
  Len       : 84
  Options   :  ASBR  E
  seq#      : 8001a292
  chksum    : 0x5fa2
  Link count: 5
   * Link ID: 100.100.0.10
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 12
     Priority : Medium
   * Link ID: 10.10.0.1
     Data   : 10.10.10.18
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.10.10.17
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium
   * Link ID: 19.23.23.15
     Data   : 10.10.30.30
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.90.25.30
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium

  Type      : Router
  Ls id     : 10.10.0.1
  Adv rtr   : 10.10.0.1
  Ls age    : 191
  Len       : 96
  Options   :  ASBR  E
  seq#      : 80013bcf
  chksum    : 0x9871
  Link count: 6
   * Link ID: 10.10.0.1
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 12
     Priority : Medium
   * Link ID: 15.51.51.14
     Data   : 10.10.0.130
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.10.0.129
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium
   * Link ID: 100.100.0.10
     Data   : 10.10.10.17
     Link Type: P-2-P
     Metric : 10
   * Link ID: 10.10.10.18
     Data   : 255.255.255.255
     Link Type: StubNet
     Metric : 10
     Priority : Medium
   * Link ID: 16.16.16.0
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 10
     Priority : Low

  Type      : Router
  Ls id     : 15.51.51.14
  Adv rtr   : 15.51.51.14
  Ls age    : 2487
  Len       : 60
  Options   :  ASBR  ABR  E
  seq#      : 8000003c
  chksum    : 0x1714
  Link count: 3
   * Link ID: 10.242.95.12
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
   * Link ID: 10.10.0.1
     Data   : 10.10.0.129
     Link Type: P-2-P
     Metric : 1
   * Link ID: 10.10.0.128
     Data   : 255.255.255.252
     Link Type: StubNet
     Metric : 1
     Priority : Low
**To remove this empty line
To remove this empty line**

Note that the line does not start with OSPF , but with a bunch of spaces, and then OSPF. Try to strip the line first. Also, startswith can take a tuple of possible prefixes, so you can check all in one go.

for line in lines:
    if not line.strip().startswith(("OSPF", "Area", "Link State")):
        fout.write(line)

Note that this might fail if some of the lines in the actual text also starts with Area or similar.

You could also use a regular expression to ensure that the line has to start with some spaces, and then one of those key words:

import re
for line in lines:
    if not re.match(r"\s+(Area|OSPF|Link State)", line):
        fout.write(line)

You can use regular expression to find some specific text and remove it. Below is the sample code, you can play with different regex as per your requirements.

try below code:

import re
regex = "OSPF|Area|Link"
for line in lines:
    if not re.findall(regex, line):
        print line

What you might do instead of reading line by line is to read the entire content of the text file and use a pattern for that specific match taking the parts of the digits into account that can vary.

^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database\s*(?:\n|$)

Explanation

  • ^ Start of string
  • [ \\t]* Match 0+ times a space or tab
  • OSPF Process \\d+ with Router ID \\d+(?:\\.\\d+){3} Match text taking the format of the digits \\d+ for Process and Router ID into account
  • \\s*Area: \\d+(?:\\.\\d+){3} Match Area: followed by 1+ digits and repeat 3 times a dot and 1+ digits
  • \\s*Link State Database Match 0+ times a whitespace char and literal text
  • \\s*(?:\\n|$) Match 0+ times a whitespace char and then match either a newline or assert the end of the string

Regex demo | Python demo

For example:

import re

filename = 'raw.txt'
pattern = r"^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database\s*(?:\n|$)"
with open(filename, 'r') as fin:
    res = re.sub(pattern, "", fin.read(), 0, re.MULTILINE)
    text_file = open("clean.txt", "w")
    text_file.write(res)
    text_file.close()

Edit

To match a empty newline after, you could use add after Database:

  • [ \\t]* Match 0+ times a space or string
  • (?: Non capturing group
    • (?:\\r?\\n|\\r)[ \\t]* Match a newline followed by matching 0+ times a tab or space
  • )? Close non capturing group and make it optional
  • $ Assert end of the string

Full pattern:

^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database[ \t]*(?:(?:\r?\n|\r)[ \t]*)?$

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM