简体   繁体   中英

html5lib returns <None>

In the first step of html5lib tutorial I see pretty confused behavior.

The docs tells:

import html5lib
f = open("mydocument.html")
doc = html5lib.parse(f)

This will return a tree in a custom "simpletree" format.

As file I have a normal html document. But in my case this is:

<None>
>>> doc is None
False

I believe it is not ok, but I have no idea what happens.

edit

If I calls read method on opened file it is returns file as string:

f = open("mydocument.html")
f.read()
# returns string with html

And after doc = html5lib.parse(f) , f.read() returns empty string, like the file the file was already read.

  • the <None> doesn't really mean that your document is not parsed, it just means that you document has no name. if you do

     doc.name = "test" print(doc) 

    it should show <test>

  • parse can also take a string as argument, in which case it will load the file for you, no need to open it yourself.

  • try print(doc.toxml())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM