[英]How to delete lines in a file up to a certain character in python 3
I have a very big file that I need to parse. 我有一个很大的文件需要解析。 I don't need any of the lines up to
'&'
. 我不需要
'&'
任何行。 I just need the information after the '&'
in the file. 我只需要文件中
'&'
之后的信息。 How do I delete the lines before the '&'
? 如何删除
'&'
之前的行? This is what I have so far: 这是我到目前为止的内容:
import re
original_file = 'file.rpt'
file_copy = 'file_copy.rpt'
with open(original_file, 'r') as rf:
with open(file_copy, 'r+') as wf:
for line in rf:
#if statement to write after the '&' has been encountered?
wf.write(line)
Input file: 输入文件:
sample text1
sample text2
sample text3
sample text4
&sample text5
sample text6
expected output file:
&sample text5
sample text6
In the rpt file, it has 6 lines, lines 1-4 are information that isn't needed. 在rpt文件中,它有6行,第1-4行是不需要的信息。 I want to delete lines 1-4, so I can focus on lines 5 and 6.
我要删除第1-4行,因此我可以专注于第5和6行。
A better and safer way would be to create a new file with smaller contents so that you can check the contents before deleting the old file. 一种更好,更安全的方法是创建一个内容较小的新文件,以便您可以在删除旧文件之前检查其中的内容。 So my suggestion would look like this:
所以我的建议看起来像这样:
original_file = 'file.rpt'
file_copy = 'file_copy.rpt'
omit = True
with open(original_file, 'r') as rf:
with open(file_copy, 'w') as wf:
for line in rf:
if "&" in line:
omit = False
if omit:
continue
else:
wf.write(line)
This code will omit all the lines up to and excluding the line containing the &
此代码将省略直到包含
&
的行的所有行&
You can also analyze the line with &
symbol: 您还可以使用
&
符号分析行:
original_file = 'file.rpt'
file_copy = 'file_copy.rpt'
omit = True
with open(original_file, 'r') as rf:
with open(file_copy, 'r+') as wf:
for line in rf:
if "&" in line:
before,after = line.split("&")
wf.write(after)
omit = False
continue
if omit:
continue
else:
wf.write(line)
The above will write also all the contents after &
but in the same line omitting anything before &
in the same line 上面的代码还将在
&
之后写入所有内容&
但在同一行中省略了&
在同一行之前的任何内容
EDIT 编辑
Also check if your opening the second file in a correct mode maybe you should use 'w'
to truncate file first 'r+'
will append to the contents of the file and I am not sure this is what you want 另外,请检查您是否以正确的模式打开了第二个文件,也许您应该先使用
'w'
截断文件,然后再将'r+'
附加到文件内容中,但我不确定这是您想要的
You don't really need to modify your file if you just want to work with some portion of it. 如果您只想使用文件的某些部分,则实际上不需要修改文件。 Using your original code, you can load the portion that you want:
使用原始代码,您可以加载所需的部分:
def load_data(filename):
with open(filename, 'r') as f:
for line in f:
if '&' in line: # or if line.startswith('&'):
break
else:
return []
return [line] + list(f)
The function load_data
will load in all the lines after the first &
it encounters. 函数
load_data
将在遇到第一个&
之后遇到的所有行中加载。 You can then write the data to another file, out just process it as you choose. 然后,您可以将数据写入另一个文件,然后根据需要进行处理。
You can even make it into a lazy generator that will only return lines as you need them: 您甚至可以将其设置为惰性生成器,该生成器仅在需要时返回行:
def trim_data(filename):
with open(filename, 'r') as f:
for line in f:
if '&' in line: # or if line.startswith('&'):
yield line
break
else:
return
yield from f
Copying the file this way, if that's what you want to do, is even easier: 如果您要这样做,以这种方式复制文件就更容易了:
with open(copy_file, 'w') as f:
for line in trim_data(original_file):
f.write(line)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.