简体   繁体   English

如何确保python中每个段落之间有两个换行符

[英]How to ensure two line breaks between each paragraph in python

I am reading txt files into python, and want to get paragraph breaks consistent. 我正在将txt文件读入python,并希望获得一致的段落分隔符。 Sometimes there is one, two, three, four... occasionally several tens or hundreds of blank lines between paragraphs. 有时在段落之间有一,二,三,四...偶尔有几十或几百个空行。

It is obviously easy to strip out all the breaks, but I can only think of "botched" ways of making everything two breaks (ie a single blank line between each paragraph). 显然,去除所有中断很容易,但是我只能想到使所有两个中断都成为“错误”方式(即,每个段落之间只有一个空白行)。 All i can think of would be specifying multiple strips/replaces for different possible combinations of breaks... which gets unwieldy when the number of breaks is very large ... or iterativly removing excess breaks until left with two, which I guess would be slow and not particularly scalable to many tens of thousands of txt files ... 我能想到的就是为不同的可能中断组合指定多个条带/替换...当中断数量非常大时变得笨拙...或者反复删除多余的中断直到剩下两个,我想可能是速度很慢,并且无法扩展到成千上万的txt文件...

Is there a moderately fast to process [/simple] way of achieving this? 是否有一种适度快速的处理[/简单]方法来实现这一目标?

import re
re.sub(r"([\r\n]){2,}",r"\1\1",x)

You can try this.Here x will be your string containing all the paragraphs. 您可以尝试一下。这里x是包含所有段落的字符串。

Here's one way. 这是一种方法。

import os
f = open("text.txt")
r = f.read()
pars = [p for p in r.split(os.linesep) if p]
print (os.linesep * 2).join(pars)

This is assuming by paragraphs we mean a block of text not containing a linebreak. 这是通过段落假定的,我们的意思是一段不包含换行符的文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM