In the first step of html5lib
tutorial I see pretty confused behavior.
The docs tells:
import html5lib
f = open("mydocument.html")
doc = html5lib.parse(f)
This will return a tree in a custom "simpletree" format.
As file I have a normal html document. But in my case this is:
<None>
>>> doc is None
False
I believe it is not ok, but I have no idea what happens.
If I calls read
method on opened file it is returns file as string:
f = open("mydocument.html")
f.read()
# returns string with html
And after doc = html5lib.parse(f)
, f.read()
returns empty string, like the file the file was already read.
the <None>
doesn't really mean that your document is not parsed, it just means that you document has no name. if you do
doc.name = "test" print(doc)
it should show <test>
parse
can also take a string as argument, in which case it will load the file for you, no need to open it yourself.
try print(doc.toxml())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.