简体   繁体   English

如何在python 3中打印/获取Html文件中的特定行

[英]How to print/get specific lines in an Html file in python 3

I wanted to print a specific line from my HTML file.我想从我的 HTML 文件中打印特定的行。 The specific line being the one enclosed as a header.特定行是作为标题括起来的那一行。 My test.html file is posted at the bottom for reference我的test.html文件贴在底部以供参考

import codecs
import re
f = codecs.open("test.html", 'r')
f.read()
paragraphs = re.findall(r'<html>(.*?)</html>',str(f))
print(paragraphs)
f.close()

test.html looks like this test.html 看起来像这样

<html>
<head>
<title>
Example
</title>
</head>
<body>
<h1>Hello, world</h1>
</body>
</html>

you could do something like this:你可以做这样的事情:

import codecs
import re
g = codecs.open("test.html", 'r')
f = g.read()
start = f.find("<head>")
start = start + 7
end =  f.find("</head>")
end = end - 1
paragraphs = f[start:end]
print(paragraphs)
g.close()

this prints这打印

<title>
Example
</title>

.find() returns the starting index of the substring inside the string you searched, then we use those indexes (after applying some simple math) to access the substring by slicing the string with [:] . .find()返回您搜索的字符串内子字符串的起始索引,然后我们使用这些索引(在应用一些简单的数学之后)通过使用[:]对字符串进行切片来访问子字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM