简体   繁体   中英

Remove multiple lines in Python

I have a file that looks like this:

<VirtualHost *:80>
    ServerName Url1
    DocumentRoot Url1Dir
</VirtualHost>

<VirtualHost *:80>
    ServerName Url2
    DocumentRoot Url2Dir
</VirtualHost>

<VirtualHost *:80>
    ServerName REMOVE
</VirtualHost>

<VirtualHost *:80>
    ServerName Url3
    DocumentRoot Url3Dir
</VirtualHost>

Where i want to remove this piece of code (it doesn't change):

<VirtualHost *:80>
    ServerName REMOVE
</VirtualHost>

I have tried to find the whole piece of code by using the code below, but it doesn't seem to work.

with open("out.txt", "wt") as fout:
        with open("in.txt", "rt") as fin:
            for line in fin:
                fout.write(line.replace("<VirtualHost *:80>\n    ServerName REMOVE\n</VirtualHost>\n", ""))

I have tried to find a solution for my problem, but have come up empty handed, so any help is much appreciated.

And before you downvote i would really like to hear why.

The quickest way would be to read the whole file into a string, perform the replacement and then write the string out to the file you need. For example:

#!/usr/bin/python

with open('in.txt', 'r') as f:
      text = f.read()

      text = text.replace("<VirtualHost *:80>\n    ServerName REMOVE\n</VirtualHost>\n\n", '')

      with open('out.txt', 'w') as f:
            f.write(text)

Here is the finite-automaton solution that can be easily modified later during the development. It may look complicated at first, but notice that you can look at the code for each status value independently. You can draw a graph (nodes as circles and arrows as oriented edges) on the paper to get the overview of what is done

status = 0      # init -- waiting for the VirtualHost section
lst = []        # lines of the VirtualHost section
with open("in.txt") as fin, open("out.txt", "w") as fout:
    for line in fin:

        #-----------------------------------------------------------
        # Waiting for the VirtualHost section, copying.
        if status == 0: 
            if line.startswith("<VirtualHost"):
                # The section was found. Postpone the output.
                lst = [ line ]  # first line of the section
                status = 1
            else:
                # Copy the line to the output.
                fout.write(line)

        #-----------------------------------------------------------
        # Waiting for the end of the section, collecting.
        elif status == 1:   
            if line.startswith("</VirtualHost"):
                # The end of the section found, and the section
                # should not be ignored. Write it to the output.
                lst.append(line)            # collect the line
                fout.write(''.join(lst))    # write the section
                status = 0  # change the status to "outside the section"
                lst = []    # not neccessary but less error prone for future modifications
            else:
                lst.append(line)    # collect the line
                if 'ServerName REMOVE' in line: # Should this section to be ignored?
                    status = 2      # special status for ignoring this section
                    lst = []        # not neccessary 

        #-----------------------------------------------------------
        # Waiting for the end of the section that should be ignored.
        elif status == 2:   
            if line.startswith("</VirtualHost"):
                # The end of the section found, but the section should be ignored.
                status = 0  # outside the section
                lst = []    # not neccessary

While the above answer is a pragmatic approach, it is fragile and not flexible in first.
Here is something somewhat less fragile:

import re

def remove_entry(servername, filename):
    """Parse file , look for entry pattern and return new content

    :param str servername: The server name to look for
    :param str filename: The file path to parse content
    :return: The new file content excluding removed entry
    :rtype: str
    """
    with open(filename) as f:       
        lines = f.readlines()        
        starttag_line = None
        PATTERN_FOUND = False       

        for line, content in enumerate(lines):
            if '<VirtualHost ' in content: 
                starttag_line = line       
            # look for entry
            if re.search(r'ServerName\s+' + servername, content, re.I):
                PATTERN_FOUND = True
            # next vhost end tag and remove vhost entry
            if PATTERN_FOUND and '</VirtualHost>' in content:
                del lines[starttag_line:line + 1]
                return "".join(lines)        


filename = '/tmp/file.conf'

# new file content
print remove_entry('remove', filename)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM