使用lxml检索类属性的名称

Question

I am working on a python project using lxml to scrap a page and I am having the challenge of retrieving the name of a span class attribute. 我正在使用lxml抓取页面的python项目中工作，并且我面临着检索span类属性名称的挑战。 The html snippet is below: html片段如下：

<tr class="nogrid">
  <td class="date">12th January 2016</td> 
  <td class="time">11:22pm</td> 
  <td class="category">Clothing</td>   
  <td class="product">
    <span class="brand">carlos santos</span>
  </td> 
  <td class="size">10</td> 
  <td class="name">polo</td> 
</tr>
....

How do I retrieve the value of the span's class attribute below: 我如何在下面检索span的class属性的值：

<span class="brand">carlos santos</span>

Answer 1

You can use the following XPath to get class attribute of span element that is direct child of td with class product : 您可以使用以下XPath来获取span元素的class属性，该元素是class product的td直接子元素：

//td[@class="product"]/span/@class

working demo example : 工作演示示例：

from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td> 
<td class="time">11:22pm</td> 
<td class="category">Clothing</td>   
<td class="product">
<span class="brand">carlos santos</span>
</td> 
<td class="size">10</td> 
<td class="name">polo</td> 
</tr>'''

root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span

output : 输出：

Brand

Answer 2

from bs4 import BeautifulSoup

lxml = '''<tr class="nogrid">
          <td class="date">12th January 2016</td> 
          <td class="time">11:22pm</td> 
          <td class="category">Clothing</td>   
          <td class="product">
            <span class="brand">carlos santos</span>
          </td> 
          <td class="size">10</td> 
          <td class="name">polo</td> 
          <tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'

使用lxml检索类属性的名称

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-01-25 21:31:46

解决方案2
1 2016-01-25 20:26:20

使用lxml检索类属性的名称

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-01-25 21:31:46

解决方案2 1 2016-01-25 20:26:20

解决方案1
4 已采纳 2016-01-25 21:31:46

解决方案2
1 2016-01-25 20:26:20