[英]cleaning a txt. Best way to delete recuring beginning code line
我必须清理具有以下结构的csv文件:
Schema Compare Sync Script 06/10/2016 11:05:03 Page 1 1 -------------------------------------------------------------------------- 2 -- Play this script in ASIA@COG2 to make it look like ASIA@TSTCOG2 3 -- 4 -- Please review the script before using it to make sure it won't 5 -- cause any unacceptable data loss. --- --- 14 Set define off; 15 16 ALTER TABLE ASIA_MART.FDM_INVOICE 17 MODIFY(I_STATUS VARCHAR2(32 CHAR)); -- -- Schema Compare Sync Script 06/10/2016 11:05:03 Page 2 76 ACCRUED_GP FLOAT(126) NOT NULL, 77 ACCRUALS_CREATE_BY VARCHAR2(64 CHAR) NOT NULL, -- -- 150 MINEXTENTS 1 Schema Compare Sync Script 06/10/2016 11:05:03 Page 3 151 MAXEXTENTS UNLIMITED
因此,我的目标是仅将SQL代码保存在另一个文件中,而不添加任何注释或行号。
到目前为止,我已经能够消除第一个注释部分,即以标志字符串“ Set define off”结尾,并且还可以捕获其他类似“ Schema Compare Sync Script”的问题。 到目前为止,我面临的挑战是赶上界限并消除。 实际上,我的代码从第15行产生了一个列表,但也有一个奇怪的日期重复。
首先,我很确定这也不是最好的代码,因此,建议多了,然后欢迎,如果有人对如何使用数字线路有所了解,那么我将不胜感激。
这是我的代码:
import re
from itertools import dropwhile
flag = 'Set define off'
found = False
buff = []
with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
searchlines = infile.readlines()
for i, line in enumerate(searchlines):
if flag in line :
found = True
if found:
#iterate over the list after the flag and attach to the list buff
for l in searchlines[i:i+1]:
buff.append(searchlines[i+2:len(searchlines)])
else:
searchlines.remove(line)
#generator to append a list of string to the list values = ','.join(str(v) for v in buff)
for i, line in enumerate(searchlines):
for line in dropwhile(lambda line: line.startswith(r'\d+'), searchlines):
buff.append(searchlines[i])
outfile.write(''.join(str(v) for v in buff))
行首的数字可用于过滤:
with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
for line in infile:
sline = line.split(" ")
if len(sline) < 2 : continue
if sline[0].isdigit() and sline[1] != "--":
outfile.write(line[len(sline[0])+1:])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.