简体   繁体   English

如何在python 3中删除文件中直到特定字符的行

[英]How to delete lines in a file up to a certain character in python 3

I have a very big file that I need to parse. 我有一个很大的文件需要解析。 I don't need any of the lines up to '&' . 我不需要'&'任何行。 I just need the information after the '&' in the file. 我只需要文件中'&'之后的信息。 How do I delete the lines before the '&' ? 如何删除'&'之前的行? This is what I have so far: 这是我到目前为止的内容:

import re

original_file = 'file.rpt'
file_copy = 'file_copy.rpt'

with open(original_file, 'r') as rf:
    with open(file_copy, 'r+') as wf:
        for line in rf:
            #if statement to write after the '&' has been encountered?
            wf.write(line)

Input file: 输入文件:

sample text1
sample text2
sample text3
sample text4
&sample text5
sample text6

expected output file:
&sample text5
sample text6

In the rpt file, it has 6 lines, lines 1-4 are information that isn't needed. 在rpt文件中,它有6行,第1-4行是不需要的信息。 I want to delete lines 1-4, so I can focus on lines 5 and 6. 我要删除第1-4行,因此我可以专注于第5和6行。

A better and safer way would be to create a new file with smaller contents so that you can check the contents before deleting the old file. 一种更好,更安全的方法是创建一个内容较小的新文件,以便您可以在删除旧文件之前检查其中的内容。 So my suggestion would look like this: 所以我的建议看起来像这样:


original_file = 'file.rpt'
file_copy = 'file_copy.rpt'
omit = True
with open(original_file, 'r') as rf:
    with open(file_copy, 'w') as wf:
        for line in rf:
            if "&" in line:
                omit = False
            if omit:
                continue
            else:
                wf.write(line)

This code will omit all the lines up to and excluding the line containing the & 此代码将省略直到包含&的行的所有行&

You can also analyze the line with & symbol: 您还可以使用&符号分析行:

original_file = 'file.rpt'
file_copy = 'file_copy.rpt'
omit = True
with open(original_file, 'r') as rf:
    with open(file_copy, 'r+') as wf:
        for line in rf:
            if "&" in line:
                before,after = line.split("&")
                wf.write(after)
                omit = False
                continue
            if omit:
                continue
            else:
                wf.write(line)

The above will write also all the contents after & but in the same line omitting anything before & in the same line 上面的代码还将在&之后写入所有内容&但在同一行中省略了&在同一行之前的任何内容

EDIT 编辑

Also check if your opening the second file in a correct mode maybe you should use 'w' to truncate file first 'r+' will append to the contents of the file and I am not sure this is what you want 另外,请检查您是否以正确的模式打开了第二个文件,也许您应该先使用'w'截断文件,然后再将'r+'附加到文件内容中,但我不确定这是您想要的

You don't really need to modify your file if you just want to work with some portion of it. 如果您只想使用文件的某些部分,则实际上不需要修改文件。 Using your original code, you can load the portion that you want: 使用原始代码,您可以加载所需的部分:

def load_data(filename):
    with open(filename, 'r') as f:
        for line in f:
            if '&' in line:  # or if line.startswith('&'):
                break
        else:
            return []
        return [line] + list(f)

The function load_data will load in all the lines after the first & it encounters. 函数load_data将在遇到第一个&之后遇到的所有行中加载。 You can then write the data to another file, out just process it as you choose. 然后,您可以将数据写入另一个文件,然后根据需要进行处理。

You can even make it into a lazy generator that will only return lines as you need them: 您甚至可以将其设置为惰性生成器,该生成器仅在需要时返回行:

def trim_data(filename):
    with open(filename, 'r') as f:
        for line in f:
            if '&' in line:  # or if line.startswith('&'):
                yield line
                break
        else:
            return
        yield from f

Copying the file this way, if that's what you want to do, is even easier: 如果您要这样做,以这种方式复制文件就更容易了:

with open(copy_file, 'w') as f:
    for line in trim_data(original_file):
        f.write(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM