简体   繁体   English

在Python中从文件夹中读取HTML文件

[英]Reading an HTML File from Folder in Python

I want to read an HTML file in Python 3.4.3. 我想在Python 3.4.3中读取一个HTML文件。

I have tried: 我努力了:

import urllib.request
fname = r"C:\Python34\html.htm"
HtmlFile = open(fname,'w')
print (HtmlFile)

This prints: 这打印:

<_io.TextIOWrapper name='C:\\Python34\\html.htm' mode='w' encoding='cp1252'>

I want to get the HTML source so that I can parse it with beautiful soup. 我想获取HTML源代码,以便我可以用美丽的汤来解析它。

You will have to read the contents of the file. 您必须阅读该文件的内容。

HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read() 

I was trying to read the saved HTML file in the folder. 我试图读取文件夹中保存的HTML文件。 I tried code mentioned by Vikasa but was getting an error. 我尝试过Vikasa提到的代码但是收到了错误。 So I changed the code and tried to read it again it worked for me. 所以我改变了代码并尝试再次阅读它对我有用。 The code is as follows: 代码如下:

    fname = 'page_source.html' #this html file is stored on the same folder of the code file
    html_file = open(fname, 'r')
    source_code = html_file.read() 

print the html page using 使用打印html页面

source_code 

It will print the content read from the page_source.html file. 它将打印从page_source.html文件中读取的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM