繁体   English   中英

清理txt。 删除重新开始代码行的最佳方法

[英]cleaning a txt. Best way to delete recuring beginning code line

我必须清理具有以下结构的csv文件:

Schema Compare Sync Script
06/10/2016 11:05:03 Page 1
1 --------------------------------------------------------------------------
2 -- Play this script in ASIA@COG2 to make it look like ASIA@TSTCOG2
3 --
4 -- Please review the script before using it to make sure it won't
5 -- cause any unacceptable data loss.
--- 
---
14 Set define off;
15
16 ALTER TABLE ASIA_MART.FDM_INVOICE
17 MODIFY(I_STATUS VARCHAR2(32 CHAR));
--
--
Schema Compare Sync Script
06/10/2016 11:05:03 Page 2
76 ACCRUED_GP FLOAT(126) NOT NULL,
77 ACCRUALS_CREATE_BY VARCHAR2(64 CHAR) NOT NULL,
--
--
150 MINEXTENTS 1
Schema Compare Sync Script
06/10/2016 11:05:03 Page 3
151 MAXEXTENTS UNLIMITED

因此,我的目标是仅将SQL代码保存在另一个文件中,而不添加任何注释或行号。

到目前为止,我已经能够消除第一个注释部分,即以标志字符串“ Set define off”结尾,并且还可以捕获其他类似“ Schema Compare Sync Script”的问题。 到目前为止,我面临的挑战是赶上界限并消除。 实际上,我的代码从第15行产生了一个列表,但也有一个奇怪的日期重复。

首先,我很确定这也不是最好的代码,因此,建议多了,然后欢迎,如果有人对如何使用数字线路有所了解,那么我将不胜感激。

这是我的代码:

import re
from itertools import dropwhile

flag = 'Set define off'
found = False
buff = []

with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
  searchlines = infile.readlines()
  for i, line in enumerate(searchlines):
    if flag in line :
      found = True
      if found:
        #iterate over the list after the flag and attach to the list buff
        for l in searchlines[i:i+1]:
          buff.append(searchlines[i+2:len(searchlines)])
    else:
      searchlines.remove(line)
      #generator to append a list of string to the list values = ','.join(str(v) for v in buff)
  for i, line in enumerate(searchlines):
    for line in dropwhile(lambda line: line.startswith(r'\d+'), searchlines):
      buff.append(searchlines[i])


  outfile.write(''.join(str(v) for v in buff))

行首的数字可用于过滤:

with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
  for line in infile:
    sline = line.split(" ")
    if len(sline) < 2 : continue
    if sline[0].isdigit() and sline[1] != "--":
        outfile.write(line[len(sline[0])+1:])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM