简体   繁体   中英

Delete everything in file after last appearance string

I want to make a program which look through files, finds every incomplete file (without </module> at the end), then it will print last found abnumber in file and delete everyline (including the last with abnumber) after it.

So my file looks like that:

<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
    <item name="item0" value="100" />
    <item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
    <item name="item1" value="100" />
    <item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
    <item name="item1" value="something11:

something11" />
    <item name="item2" value="233" />
    <item name="item3" value="233" />
    <item name="item4" value="something12:

12something" />
</object>
<object id="1238" name="name2" abnumber="4">
    <item name="item8" value="something12:
    <item name="item9" value="233" />

and at the end it should looks like:

<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
    <item name="item0" value="100" />
    <item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
    <item name="item1" value="100" />
    <item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
    <item name="item1" value="something11:

something11" />
    <item name="item2" value="233" />
    <item name="item3" value="233" />
    <item name="item4" value="something12:

12something" />
</object>

with printed: 4

I started by doing something like that but I feel like I am doing everything wrong:

import os

Mainfile = 'path'
for filename in os.listdir(Mainfile):
    lines = filename.readlines()
    if not "</Module>" in lines:
        with open(filename, 'r+', encoding="utf-8") as file:
            line_list = list(file)
            line_list.reverse()
            for line in line_list:
                if line.find('absno') != -1:
                    print(line)

You can use re to get your result :

<object([\s\S]*?)<\/object> to get correct <object... </object> tag

abnumber=\"([0-9.]+) to get abnnumber for incorrect tag

<Module.*|<object(?:[\s\S]*?)<\/object> to get correct format of xml data

import re

data = """<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
    <item name="item0" value="100" />
    <item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
    <item name="item1" value="100" />
    <item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
    <item name="item1" value="something11:

something11" />
    <item name="item2" value="233" />
    <item name="item3" value="233" />
    <item name="item4" value="something12:

12something" />
</object>
<object id="1238" name="name2" abnumber="4">
    <item name="item8" value="something12:
    <item name="item9" value="233" />"""


invalid_XML_Tag = re.sub("<object([\s\S]*?)<\/object>", '', data)
abnnumber_value = re.findall("abnumber=\"([0-9.]+)", invalid_XML_Tag)
print("abnumber of invalid tag => {0}".format(abnnumber_value))

correct_xml_format = re.findall("<Module.*|<object(?:[\s\S]*?)<\/object>",data)
print("".join(correct_xml_format))

Output:

 abnumber of invalid tag => ['4']
<Module bs="Mainfile_1"><object id="1000" name="namex" abnumber="1">
    <item name="item0" value="100" />
    <item name="item00" value="100" />
</object><object id="1001" name="namey" abnumber="2">
    <item name="item1" value="100" />
    <item name="item00" value="100" />
</object><object id="1234" name="name1" abnumber="3">
    <item name="item1" value="something11:

something11" />
    <item name="item2" value="233" />
    <item name="item3" value="233" />
    <item name="item4" value="something12:

12something" />
</object>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM