简体   繁体   English

从文件python中删除字符串和字符串之前的所有行

[英]remove string and all lines before string from file python

I have a filename with thousands of lines of data in it.我有一个包含数千行数据的文件名。 I am reading in the filename and editing it.我正在读取文件名并对其进行编辑。

The following tag is about ~900 lines in or more (it varies per file):以下标签大约有 900 行或更多行(每个文件都不同):

<Report name="test" xmlns:cm="http://www.domain.org/cm">

I need to remove that line and everything before it in several files.我需要在几个文件中删除该行及其之前的所有内容。 so I need to the code to search for that tag and delete it and everything above it it will not always be 900 lines down, it will vary;所以我需要代码来搜索该标签并删除它,它上面的所有内容并不总是向下 900 行,它会有所不同; however, the tag will always be the same.但是,标签将始终相同。

I already have the code to read in the lines and write to a file.我已经有了可以读取行并写入文件的代码。 I just need the logic behind finding that line and removing it and everything before it.我只需要找到该行并删除它以及它之前的所有内容背后的逻辑。

I tried reading the file in line by line and then writing to a new file once it hits on that string, but the logic is incorrect:我尝试逐行读取文件,然后在遇到该字符串后写入新文件,但逻辑不正确:

readFile = open(firstFile)
lines = readFile.readlines()
readFile.close()
w = open('test','w')
for item in lines:
    if (item == "<Report name="test" xmlns:cm="http://www.domain.org/cm">"):
        w.writelines(item)
w.close()

In addition, the exact string will not be the same in each file.此外,每个文件中的确切字符串不会相同。 The value "test" will be different.值“test”会有所不同。 I perhaps need to check for the tag name ""我可能需要检查标签名称“”

Any help will be much appreciated.任何帮助都感激不尽。

You can use a flag like tag_found to check when lines should be written to the output.您可以使用tag_found类的标志来检查何时应将行写入输出。 You initially set the flag to False , and then change it to True once you've found the right tag.您最初将标志设置为False ,然后在找到正确的标签后将其更改为True When the flag is True , you copy the line to the output file.当标志为True ,您将该行复制到输出文件。

TAG = '<Report name="test" xmlns:cm="http://www.domain.org/cm">'

tag_found = False
with open('tag_input.txt') as in_file:
    with open('tag_output.txt', 'w') as out_file:
        for line in in_file:
            if not tag_found:
                if line.strip() == TAG:
                    tag_found = True
            else:
                out_file.write(line)

PS: The with open(filename) as in_file: syntax is using what Python calls a "context manager"- see here for an overview. PS: with open(filename) as in_file:语法使用 Python 称为“上下文管理器”的内容 - 请参阅此处了解概述。 The short explanation of them is that they automatically take care of closing the file safely for you when the with: block is finished, so you don't have to remember to put in my_file.close() statements.它们的简短解释是,当with:块完成时,它们会自动为您安全地关闭文件,因此您不必记住放入my_file.close()语句。

You can use a regular expression to match you line:您可以使用正则表达式来匹配您的行:

regex1 = '^<Report name=.*xmlns:cm="http://www.domain.org/cm">$'

Get the index of the item that matches the regex:获取与正则表达式匹配的项的索引:

listIndex = [i for i, item in enumerate(lines) if re.search(regex, item)]

Slice the list:切片列表:

listLines = lines[listIndex:]

And write to a file:并写入文件:

with open("filename.txt", "w") as fileOutput:
    fileOutput.write("\n".join(listLines))

pseudocode伪代码

Try something like this:尝试这样的事情:

import re

regex1 = '^<Report name=.*xmlns:cm="http://www.domain.org/cm">$' # Variable @name
regex2 = '^<Report name=.*xmlns:cm=.*>$' # Variable @name & @xmlns:cm

with open(firstFile, "r") as fileInput:
    listLines = fileInput.readlines()

listIndex = [i for i, item in enumerate(listLines) if re.search(regex1, item)]
# listIndex = [i for i, item in enumerate(listLines) if re.search(regex2, item)] # Uncomment for variable @name & @xmlns:cm

with open("out_" + firstFile, "w") as fileOutput:
    fileOutput.write("\n".join(lines[listIndex:]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在特定字符串开始python之前删除所有行 - Remove all lines before specific string starts python Python - 如何删除某个字符串第一次出现之前的所有行 - Python - How to remove all the lines before the first occurrence of a certain string Python复制另一个文本文件中字符串之前的所有行 - Python copy all lines before string in another text file Python - 如何从文本文件中读取多行作为字符串并删除所有编码? - Python - How to read multiple lines from text file as a string and remove all encoding? 如果找到字符,则从字符串中删除所有行,python - Remove all lines from a string if a character is found, python 在“不在所有行python之前获取一些字符串 - Get some string before " not in all lines python 如何从文件名或字符串中删除所有数字-Python3 - How to remove all numbers from a file name or a string - Python3 Python,将文件中具有特定字符串的所有行添加到列表中,然后随机选择要打印的字符串? - Python, Adding all lines with a certain string from a file to list then randomly choosing which string to print? 使用 Python 查找字符串并删除从匹配字符串到文件末尾的所有行 - Using Python find a string and delete all lines from the matched string to the end of the file Python - 删除列表中所有以单词/字符串开头的行 - Python - Remove all the lines starting with word/string present in a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM