Reading text in elements using lxml.etree

Question

I am using the Python version of the lxml libray. I am currently trying to parse the text from a table but am encountering a problem in that some of the text is links.

For example, one of the cells may look something like this:

<td>
    Can I kick it, <a>to all the people</a> who can quest like a <a>tribe</a> does
</td>

Say after parsing the html, the td element is stored as foo . Then foo.text will not display the whole text, only the parts that aren't links. Moreover, if I find the link text using [i.text for i in foo.getchildren()] I no longer know the order in which to put the non-link text and link text.

Is there an easy way to get around this?

Answer 1

Well after searching for an hour, within 2 minutes of posting this question I have found the solution.

Use the method foo.text_content() and this will display what is needed.

Reading text in elements using lxml.etree

Question

1 answers

solution1
1 ACCPTED 2013-09-23 00:38:16

Reading text in elements using lxml.etree

Question

1 answers

solution1 1 ACCPTED 2013-09-23 00:38:16

solution1
1 ACCPTED 2013-09-23 00:38:16