在Python中从文件夹中读取HTML文件

Question

I want to read an HTML file in Python 3.4.3. 我想在Python 3.4.3中读取一个HTML文件。

I have tried: 我努力了：

import urllib.request
fname = r"C:\Python34\html.htm"
HtmlFile = open(fname,'w')
print (HtmlFile)

This prints: 这打印：

<_io.TextIOWrapper name='C:\\Python34\\html.htm' mode='w' encoding='cp1252'>

I want to get the HTML source so that I can parse it with beautiful soup. 我想获取HTML源代码，以便我可以用美丽的汤来解析它。

Answer 1

You will have to read the contents of the file. 您必须阅读该文件的内容。

HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read()

Answer 2

I was trying to read the saved HTML file in the folder. 我试图读取文件夹中保存的HTML文件。 I tried code mentioned by Vikasa but was getting an error. 我尝试过Vikasa提到的代码但是收到了错误。 So I changed the code and tried to read it again it worked for me. 所以我改变了代码并尝试再次阅读它对我有用。 The code is as follows: 代码如下：

    fname = 'page_source.html' #this html file is stored on the same folder of the code file
    html_file = open(fname, 'r')
    source_code = html_file.read()

print the html page using 使用打印html页面

source_code

It will print the content read from the page_source.html file. 它将打印从page_source.html文件中读取的内容。

在Python中从文件夹中读取HTML文件

问题描述

2 个解决方案

解决方案1
7 已采纳 2015-09-13 07:58:22

解决方案2
0 2019-05-29 08:38:57

在Python中从文件夹中读取HTML文件

问题描述

2 个解决方案

解决方案1 7 已采纳 2015-09-13 07:58:22

解决方案2 0 2019-05-29 08:38:57

解决方案1
7 已采纳 2015-09-13 07:58:22

解决方案2
0 2019-05-29 08:38:57