简体   繁体   English

如何将网页保存为文本文件[Python]

[英]How to save web page as text file [Python]

I would like to save a web page (all content) as a text file. 我想将网页(所有内容)保存为文本文件。 (As if you did right click on webpage -> "Save Page As" -> "Save as text file" and not as html file) (如果您右键单击网页 - >“将页面另存为” - >“另存为文本文件”而不是html文件)

I have tried using the following code: 我尝试使用以下代码:

import urllib2
url=''
page = urllib2.urlopen(url)
page_content = page.read()
file = open('file_text.txt', 'w')
f.write(page_content)
f.close()

My goal is to be able to save a whole text without html code. 我的目标是能够在没有HTML代码的情况下保存整个文本。 (for example i would like read "è" instead "&eacute") (例如我想读“è”而不是“&eacute”)

Have a look at html2text as mentioned elsewhere 看看其他地方提到的html2text

import urllib2
import html2text
url=''
page = urllib2.urlopen(url)
html_content = page.read()
rendered_content = html2text.html2text(html_content)
file = open('file_text.txt', 'w')
file.write(rendered_content)
file.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM