How to parse HTML using the lxml.html library

Question

Here is the HTML that appears on my site:

<meta content="auth" name="param" />
<meta content="I_WANT_THIS" name="token" />

How can I use lxml.html to grab that?

Answer 1

Use xpath to find the meta tag by name attribute and get the value of content attribute:

from lxml.html import fromstring


html_data = """ <meta content="auth" name="param" />
 <meta content="I_WANT_THIS" name="token" />"""

tree = fromstring(html_data)
print tree.xpath('//meta[@name="token"]/@content')

prints:

['I_WANT_THIS']

How to parse HTML using the lxml.html library

Question

1 answers

solution1
2 ACCPTED 2014-03-12 21:47:32

How to parse HTML using the lxml.html library

Question

1 answers

solution1 2 ACCPTED 2014-03-12 21:47:32

solution1
2 ACCPTED 2014-03-12 21:47:32