[英]Delete x-line paragraphs from text file with Python
我有一個長文本文件,每個段落有6行和7行。 我需要將所有七個行段落都寫入文件,然后將六個行段落都寫入文件。 或刪除6行(7行)段落。 每個段落用空白行(或兩個空白行)分隔。 文本文件示例:
Firs Name Last Name address1 Address2 Note 1 Note 2 Note3 Note 4 First Name LastName add 1 add 2 Note2 Note3 Note4 etc...
我想將python 3用於Windows。 歡迎任何幫助。 謝謝!
作為對stackoverflow的歡迎,並且由於我認為您現在已經在搜索更多代碼,因此為您提出以下代碼。
它驗證段落不超過7行且不少於6行。 當源中存在此類段落時,它將發出警告。
您將刪除所有打印件以得到清晰的代碼,但是使用它們可以遵循算法。
我認為其中沒有錯誤,但不要以為100%可以肯定。
這不是唯一的方法,但是我選擇了可用於所有類型的文件的方法,無論是否大小:一次迭代一行。 可以一次讀取整個文件,然后將其分成幾行,或者在正則表達式的幫助下進行處理。 但是,當一個文件很大時,一次讀取所有文件會占用大量內存。
with open('source.txt') as fsource,\
open('SIX.txt','w') as six, open('SEVEN.txt','w') as seven:
buf = []
cnt = 0
exceeding7paragraphs = 0
tinyparagraphs = 0
line = 'go'
while line:
line = fsource.readline()
cnt += 1
buf.append(line)
if len(buf)<6 and line.rstrip('\n\r')=='':
tinyparagraphs += 1
print cnt,repr(line),"this line of paragraph < 6 is void,"+\
"\nthe treatment of all this paragraph is skipped\n"+\
'\n# '+str(cnt)+' '+ repr(line)+" skipped line "
buf = []
while line and line.rstrip('\n\r')=='':
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '#',cnt,repr(line)
else:
buf.append(line)
print '!',cnt,repr(line),' put in void buf'
else:
print cnt,repr(line),' put in buf'
if len(buf)==6:
line = fsource.readline() # reading a potential seventh line of a paragraph
cnt += 1
if line.rstrip('\n\r'): # means the content of the seventh line isn't void
buf.append(line)
print cnt,repr(line),'seventh line put in buf'
line = fsource.readline()
cnt += 1
if line.rstrip('\n\r'): # means the content of the eighth line isn't void
exceeding7paragraphs += 1
print cnt,repr(line),"the eight line isn't void,"+\
"\nthe treatment of all this paragraph is skipped"+\
"\neighth line skipped"
buf = []
while line and line.rstrip('\n\r'):
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '\n#',cnt,repr(line)
else:
print str(cnt) + ' ' + repr(line)+' skipped line'
else:
if line=='':
print cnt,"line is '' , EOF -> the program will be stopped\n"
else: # line.rstrip('\n\r') is ''
print cnt,'eighth line is void',repr(line)
seven.write(''.join(buf) + '\n')
print buf,'\n',len(buf),'lines recorded in file SEVEN\n'
buf = []
else:
print cnt,repr(line),'seventh line: void'
six.write(''.join(buf) + '\n')
print buf,'\n',len(buf),'lines recorded in file SIX'
buf = []
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
else:
print '\nthe line is',cnt, repr(line)
while line and line.rstrip('\n\r')=='':
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '#',cnt,repr(line)
else: # line.rstrip('\n\r') != ''
buf.append(line)
print '!',cnt,repr(line),' put in void buf'
if exceeding7paragraphs>0:
print '\nWARNING :'+\
'\nThere are '+str(exceeding7paragraphs)+' paragraphs whose number of lines exceeds 7.'
if tinyparagraphs>0:
print '\nWARNING :'+\
'\nThere are '+str(tinyparagraphs)+' paragraphs whose number of lines is less than 6.'
print '\n===================================================================='
print 'File SIX\n'
with open('SIX.txt') as six:
print six.read()
print '===================================================================='
print 'File SEVEN\n'
with open('SEVEN.txt') as seven:
print seven.read()
我也贊成您的問題,因為這不是一個看起來似乎很容易解決的問題,為了不讓您發表一篇文章並發表一篇反對意見,它開始令人沮喪。 就像其他人所說的那樣,下次嘗試使演示文稿更好。
。
編輯:
這是一個簡化的代碼,用於文本中僅包含6或7行的段落,正好由1或2行分隔,如問題措辭中所述
with open('source2.txt') as fsource,\
open('SIX.txt','w') as six, open('SEVEN.txt','w') as seven:
buf = []
line = fsource.readline()
while not line: # to go to the first non empty line
line = fsource.readline()
while True:
buf.append(line) # this line is the first of a paragraph
print '\n- first line of a paragraph',repr(line)
for i in xrange(5):
buf.append(fsource.readline())
# at this point , 6 lines of a paragraph have been read
print '-- buf 6 : ',buf
line = fsource.readline()
print '--- line seventh',repr(line),id(line)
if line.rstrip('\r\n'):
buf.append(line)
seven.write(''.join(buf) + '\n')
buf = []
line = fsource.readline()
else:
six.write(''.join(buf) + '\n')
buf = []
# at this point, line is the empty line after a paragraph or EOF
print '---- line after',repr(line),id(line)
line = fsource.readline()
print '----- second line after',repr(line)
# at this point, line is an empty line after a paragraph or EOF
# or the first line of a new paragraph
if not line: # it is EOF
break
if not line.rstrip('\r\n'): # it is a second empty line
line = fsource.readline()
# now line is the first of a new paragraph
print '\n===================================================================='
print 'File SIX\n'
with open('SIX.txt') as six:
print six.read()
print '===================================================================='
print 'File SEVEN\n'
with open('SEVEN.txt') as seven:
print seven.read()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.