简体   繁体   中英

cleaning a txt. Best way to delete recuring beginning code line

I have to clean a csv file that has this structure:

Schema Compare Sync Script
06/10/2016 11:05:03 Page 1
1 --------------------------------------------------------------------------
2 -- Play this script in ASIA@COG2 to make it look like ASIA@TSTCOG2
3 --
4 -- Please review the script before using it to make sure it won't
5 -- cause any unacceptable data loss.
--- 
---
14 Set define off;
15
16 ALTER TABLE ASIA_MART.FDM_INVOICE
17 MODIFY(I_STATUS VARCHAR2(32 CHAR));
--
--
Schema Compare Sync Script
06/10/2016 11:05:03 Page 2
76 ACCRUED_GP FLOAT(126) NOT NULL,
77 ACCRUALS_CREATE_BY VARCHAR2(64 CHAR) NOT NULL,
--
--
150 MINEXTENTS 1
Schema Compare Sync Script
06/10/2016 11:05:03 Page 3
151 MAXEXTENTS UNLIMITED

So the my goal is to keep in another file only the SQL code without any comment or line number.

So far I been able to eliminate the first comment part, that end withe the flag string "Set define off", and also catch the others like "Schema Compare Sync Script" would it be a problem. The challenge for me is been so far catch the line and eliminate. Actually my code produce a list from the line 15, but also a strange recurration of the date.

First I am quite sure it is not the best code even, so suggestion are more then welcome, and if someone has an idea of how get ride of the number line, would me more then appreciate.

Here my code:

import re
from itertools import dropwhile

flag = 'Set define off'
found = False
buff = []

with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
  searchlines = infile.readlines()
  for i, line in enumerate(searchlines):
    if flag in line :
      found = True
      if found:
        #iterate over the list after the flag and attach to the list buff
        for l in searchlines[i:i+1]:
          buff.append(searchlines[i+2:len(searchlines)])
    else:
      searchlines.remove(line)
      #generator to append a list of string to the list values = ','.join(str(v) for v in buff)
  for i, line in enumerate(searchlines):
    for line in dropwhile(lambda line: line.startswith(r'\d+'), searchlines):
      buff.append(searchlines[i])


  outfile.write(''.join(str(v) for v in buff))

Digits at start of line can be used for filtering:

with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
  for line in infile:
    sline = line.split(" ")
    if len(sline) < 2 : continue
    if sline[0].isdigit() and sline[1] != "--":
        outfile.write(line[len(sline[0])+1:])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM