Accessing first element of output in lxml.html

Question

With lxml.html, how do I access single elements without using a for loop?

This is the HTML:

<tr class="headlineRow">
  <td>
    <span class="headline">This is some awesome text</span>
  </td>
</tr>

For example, this will fail with IndexError:

 for row in doc.cssselect('tr.headlineRow'):
     headline = row.cssselect('td span.headline')
     print headline[0]

This will pass:

 for row in doc.cssselect('tr.headlineRow'):
     headline = row.cssselect('td span.headline')
     for first_thing in headline:
         print headline[0].text_content()

Answer 1

I usually use the xpath method for things like this. It returns a list of matching elements.

>>> spans = doc.xpath('//tr[@class="headlineRow"]/td/span[@class="headline"]')
>>> spans[0].text
'This is some awesome text'

Answer 2

I tried out your example using CSSSelector and headline[0] worked fine. See below:

>>> html  ="""<tr class="headlineRow">
  <td>
    <span class="headline">This is some awesome text</span>
  </td>
</tr>"""
>>> from lxml import etree
>>> from lxml.cssselect import CSSSelector
>>> doc = etree.fromstring(html)
>>> sel1 = CSSSelector('tr.headlineRow')
>>> sel2 = CSSSelector('td span.headline')
>>> for row in sel1(doc):
    headline = sel2(row)
    print headline[0]

<Element span at 8f31e3c>

Answer 3

Elements are accessed the same way you access nested lists:

>>> doc[0][0]
<Element span at ...>

Or via CSS selectors:

doc.cssselect('td span.headline')[0]

Answer 4

Your "failing" example works perfectly for me? Either you made a mistake when trying it out, or you are using an older version of lxml that has a - now fixed - bug (I tried 2.2.6, and with 2.1.1 - the oldest I had around, and both worked)

Accessing first element of output in lxml.html

Question

4 answers

solution1
1 2010-09-20 06:14:54

solution2
0 ACCPTED 2010-08-26 07:02:18

solution3
0 2010-08-26 07:43:41

solution4
0 2010-08-26 09:02:06

Accessing first element of output in lxml.html

Question

4 answers

solution1 1 2010-09-20 06:14:54

solution2 0 ACCPTED 2010-08-26 07:02:18

solution3 0 2010-08-26 07:43:41

solution4 0 2010-08-26 09:02:06

solution1
1 2010-09-20 06:14:54

solution2
0 ACCPTED 2010-08-26 07:02:18

solution3
0 2010-08-26 07:43:41

solution4
0 2010-08-26 09:02:06