简体   繁体   中英

How can I solve this particular “TypeError: Type 'NoneType' cannot be serialized.” error?

First, a brief description of the problem: Within an unordered list, we have many list items, each of which correspond to a "flashcard"

<ul>
    <li>
        <p><span>can you slice columns in a 2d list? </span></p>
        <pre><code class='language-python' lang='python'>queryMatrixTranspose[a-1:b][i] = queryMatrix[i][a-1:b] </code></pre>
        <ul>
            <li>
                <span>No: can&#39;t do this because python doesn&#39;t support multi-axis slicing, only multi-list slicing; see the article </span><a href='http://ilan.schnell-web.net/prog/slicing/' target='_blank' class='url'>http://ilan.schnell-web.net/prog/slicing/</a><span> for more info.</span> 
            </li>
        </ul>
    </li>
</ul>

The answer on the flashcard will always be a list item located under the xpath: /html/body/ul/li/ul . I'd like to retrieve the answer in the format shown here

    <li>
        <span>No: can&#39;t do this because python doesn&#39;t support multi-axis slicing, only multi-list slicing; see the article </span><a href='http://ilan.schnell-web.net/prog/slicing/' target='_blank' class='url'>http://ilan.schnell-web.net/prog/slicing/</a><span> for more info.</span> 
    </li>

The flashcard's question is everything that remains in the xpath: /html/body/ul/li after the answer has been extracted:

    <li>
        <p><span>can you slice columns in a 2d list? </span></p>
        <pre><code class='language-python' lang='python'>queryMatrixTranspose[a-1:b][i] = queryMatrix[i][a-1:b] </code></pre>
    </li>

For each flashcard in an unordered list of flashcards, I'd like to extract the utf-8 encoded html content of the question and answer list items. That is, I'd like to have both the text and html tags.


I tried to solve this problem by iterating through each flashcard and corresponding answer and removing the child-node answer from the parent-node flashcard.

flashcard_list = []
htmlTree = html.fromstring(htmlString)    
for flashcardTree,answerTree in zip(htmlTree.xpath("/html/body/ul/li"),
 htmlTree.xpath('/html/body/ul/li/ul')):

    flashcard = html.tostring(flashcardTree, 
        pretty_print=True).decode("utf-8")

    answer = html.tostring(answerTree, 
        pretty_print=True).decode("utf-8")

    question = html.tostring(flashcardTree.remove(answerTree), 
        pretty_print=True).decode("utf-8")

    flashcard_list.append((question,answer))

However, when I try to remove the answer child-node with flashcardTree.remove(answerTree) , I encounter the error, TypeError: Type 'NoneType' cannot be serialized. I don't understand why this function would return none; I'm trying to remove a node at /html/body/ul/li/ul which is a valid child node of /html/body/ul/li .

Whatever suggestions you have would be greatly appreciated. I'm not in any way attached to the code I wrote in my first attempt; I'll accept any answer where the output is a list of (question,answer) tuples, one for each flashcard.

If I understand correctly what you are looking for, this should work:

for flashcardTree,answerTree in zip(htmlTree.xpath("/html/body/ul/li/p/span"),
 htmlTree.xpath('/html/body/ul/li/ul/li/descendant-or-self::*')):

    question = flashcardTree.text
    answer = answerTree.text_content().strip()
    flashcard_list.append((question,answer))

for i in flashcard_list:
    print(i[0],'\n',i[1])

Output:

can you slice columns in a 2d list?
No: can't do this because python doesn't support multi-axis slicing, only multi-list slicing; see the article http://ilan.schnell-web.net/prog/slicing/ for more info.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM