[英]Retrieving the name of a class attribute with lxml
I am working on a python project using lxml to scrap a page and I am having the challenge of retrieving the name of a span class attribute. 我正在使用lxml抓取页面的python项目中工作,并且我面临着检索span类属性名称的挑战。 The html snippet is below: html片段如下:
<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>
....
How do I retrieve the value of the span's class attribute below: 我如何在下面检索span的class属性的值:
<span class="brand">carlos santos</span>
You can use the following XPath to get class
attribute of span
element that is direct child of td
with class product
: 您可以使用以下XPath来获取span
元素的class
属性,该元素是class product
的td
直接子元素:
//td[@class="product"]/span/@class
working demo example : 工作演示示例:
from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>'''
root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span
output : 输出:
Brand
from bs4 import BeautifulSoup
lxml = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
<tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.