简体   繁体   中英

How to print/get specific lines in an Html file in python 3

I wanted to print a specific line from my HTML file. The specific line being the one enclosed as a header. My test.html file is posted at the bottom for reference

import codecs
import re
f = codecs.open("test.html", 'r')
f.read()
paragraphs = re.findall(r'<html>(.*?)</html>',str(f))
print(paragraphs)
f.close()

test.html looks like this

<html>
<head>
<title>
Example
</title>
</head>
<body>
<h1>Hello, world</h1>
</body>
</html>

you could do something like this:

import codecs
import re
g = codecs.open("test.html", 'r')
f = g.read()
start = f.find("<head>")
start = start + 7
end =  f.find("</head>")
end = end - 1
paragraphs = f[start:end]
print(paragraphs)
g.close()

this prints

<title>
Example
</title>

.find() returns the starting index of the substring inside the string you searched, then we use those indexes (after applying some simple math) to access the substring by slicing the string with [:] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM