[英]python remove certain dynamic lines from txt file
我有几个 txt 文件,其数据行结构如下:
文件 1
Header1, xx, yy
Redundant line 1
Redundant line 2
Redundant line 3
Header2, #012345 (random numbers)
data content (to the end of file)
文件 2
Header1, xx, yy
Redundant line 1
Redundant line 2
Redundant line 3
Redundant line 4
Header2, #67891 (random numbers)
data content (to the end of file)
文件 3
Header1, xx, yy
Redundant line 1
Redundant line 2
Header2, #54321 (random numbers)
data content (to the end of file)
预期输出:
对于每个文件,我想删除那些冗余行,只保留 Header1、Header2、#zzzzz 编号的行和以下带有数据内容的行到文件末尾,然后保存到一个新的单独文件,因此每个新文件具有以下数据结构:
Header1, xx, yy
Header2, #zzzzz (keep random numbers from original file)
data content (to the end of file)
我的代码:
我的代码不适用于每个带有动态冗余行的文件,有人可以帮忙提供一些建议,谢谢!
with open('File1.txt') as old, open('new_file1.txt', 'w') as new:
lines = old.readlines()
new.writelines(lines[0:1]) #keep Header1
new.writelines(lines[N:]) #keep Header2 and following data content to the end
您可以使用初始值1
定义N
变量,并继续将其递增1
直到一行与正则表达式.*?,#\\d+
(对于第二个标题)匹配:
import re
with open('File1.txt') as old, open('new_file1.txt', 'w') as new:
lines = old.readlines()
new.writelines(lines[:1]) #keep Header1
N = 1
while True:
N += 1
if re.match(".*?,#\d+", lines[N]):
break
new.writelines(lines[N:]) #keep Header2 and following data content to the end
输入文件File1.txt
:
Header1, xx, yy
Redundant line 1
Redundant line 2
Redundant line 3
Header2, #012345 (random numbers)
data content (to the end of file)
输出文件new_file1.txt
:
Header1, xx, yy
Header2, #012345 (random numbers)
data content (to the end of file)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.