![](/img/trans.png)
[英]Python script to remove multiple blank lines between paragraphs and end of file
[英]PYTHON Basic Text Browser/Scraper. How to remove blank lines but keep at least one between paragraphs
我创建了一个基本的文本浏览器/刮板,它可以满足我的要求。 然而,当收到来自网站的文本时,会有大量额外的空行。 有没有办法删除多余的空行,但在段落之间至少保留一个空行?
这是我的代码....
import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = input('Enter a URL starting with https or http: ')
host = url
webUrl = urllib.request.urlopen(host)
print('result code: ' + str(webUrl.getcode()))
data = webUrl.read()
soup = BeautifulSoup(data, features="html.parser")
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
print (text)
input('Scroll Up or Press ENTER to Exit')
使用re.sub
用单个换行符替换多个换行符,在它们之间和之前使用可选的空格:
import re
text = re.sub(r"\s*\n", "\n", text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.