简体   繁体   English

Python脚本删除段落和文件末尾之间的多个空行

[英]Python script to remove multiple blank lines between paragraphs and end of file

I wrote a python script to capture the data I want but I have a resulting text file that contains multiple paragraphs but each paragraph is separated by varying blank lines - anywhere from 2 to 8.我编写了一个 python 脚本来捕获我想要的数据,但我有一个包含多个段落的结果文本文件,但每个段落由不同的空行分隔 - 从 2 到 8。

My file also has multiple blank lines at the end of the file.我的文件在文件末尾也有多个空行。

I would like Python to leave no more than 2 blank lines between paragraphs and and no blank lines at the end of the text file.我希望 Python 在段落之间留下不超过 2 个空行,并且文本文件末尾没有空行。

I have experimented with a loop and line.strip, replace etc but I clearly have little idea how to put this together.我已经尝试过循环和 line.strip、replace 等,但我显然不知道如何将它们放在一起。

examples of what I have been using so far到目前为止我一直在使用的例子

wf = open(FILE,"w+")
for line in wf:
         newline = line.strip('^\r\n')
         wf.write(newline)
         wf.write('\n')

Here's some untested code:这是一些未经测试的代码:

import re

new_lines = re.compile('\n{2,9}')

with open(FILE) as f:
    contents = f.read()
contents = re.sub(new_lines, '\n\n\n', contents.strip())
with open(FILE, 'w') as f:
    f.write(contents)

First the blank lines at the end are removed.首先删除末尾的空行。 Then the regular expression matches instances of 2 to 9 newlines in the contents of the file, and replaces these with 3 newlines by the re.sub() function.然后正则表达式匹配文件内容中 2 到 9 个换行符的实例,并通过re.sub()函数用 3 个换行符替换它们。

It's actually easier to remove all blank lines and then insert two blank lines between paragraphs (and none at the end) than counting all blank lines and only removing if there's more than two.实际上,删除所有空行然后在段落之间插入两个空行(最后没有)比计算所有空行并仅在有两个以上时才删除更容易。 Unless you're dealing with huge files I don't think there's going to be any performance difference between the two approaches.除非您正在处理大文件,否则我认为这两种方法之间不会有任何性能差异。 Here's a quick and dirty solution using re :这是使用re的快速而肮脏的解决方案:

import re
# Reads from file
f = open('test.txt', 'r+')
txt = f.read()
# Removes all blank lines
txt = re.sub(r'\n\s*\n', '\n', txt)
# Adds two blanks between all paragraphs
txt = re.sub(r'\n', '\n\n\n', txt)
# Removes the blank lines from the EOF
txt = re.sub(r'\n*\Z', '', txt)
# Writes to file and closes
f.write(txt)
f.close()

Before:前:

One line below

None below
Three below



EOF with one blank line below (stackoverflow's code thingy omits it)

After:后:

One line below


None below


Three below


EOF with one blank line below

I know the answer requested is python, but I believe that might be an overkill.我知道要求的答案是 python,但我认为这可能有点矫枉过正。

Why not preprocess the file directly on your shell ?为什么不直接在 shell 上预处理文件? Use grep or sed or awk to accomplish this.使用grepsedawk来完成此操作。

Here is the grep version:这是grep版本:

$ grep -v '^$' input.txt > output.txt

Here is a quick reference I found这是我找到快速参考

So far, the question has not really been answered.到目前为止,这个问题还没有得到真正的回答。 Here is a solution that works, but I think it could be better.这是一个有效的解决方案,但我认为它可能会更好。

newtext = ''    
counter = 0
for line in text.splitlines():
    line = line.strip()
    if len(line)==0:
        counter += 1
        if counter<=2:
            newtext += line + '\n'
    else:
        newtext += line + '\n'
        counter = 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM