xpath 不能只選擇一個 html 標簽

Question

我試圖從網站獲取一些數據，但是當我使用以下代碼時，它會返回所有匹配的元素，我只想返回第一個匹配項！ 我試過extract_first，但沒有返回！

# -*- coding: utf-8 -*-
import scrapy
from gumtree.items import GumtreeItem



class FlatSpider(scrapy.Spider):
    name = "flat"
    allowed_domains = ["gumtree.com"]
    start_urls = (
        'https://www.gumtree.com/flats-for-sale',
    )

    def parse(self, response):
        item = GumtreeItem()
        item['title'] = response.xpath('//*[@class="listing-title"][1]/text()').extract()
        return item

如何使用 xpath 選擇器只選擇一個元素？

Answer 1

這是因為第一個元素實際上是空的 - 僅過濾掉非空值並使用extract_first() - 對我有用：

$ scrapy shell "https://www.gumtree.com/flats-for-sale" -s USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36"
In [1]: response.xpath('//*[@class="listing-title"][1]/text()[normalize-space(.)]').extract_first().strip()
Out[1]: u'REDUCED to sell! Stunning Hove sea view flat.'

Answer 2

嚴格來說應該是response.xpath('(//*[@class="listing-title"])[1]/text()')但是如果你想要獲取每個廣告的標題（創建例如一個項目）你可能應該這樣做：

for article in response.xpath('//article[@data-q]'):
     item = GumtreeItem()
     item['title'] = article.css('.listing-title::text').extract_first()
     yield item

xpath 不能只選擇一個 html 標簽

問題描述

2 個解決方案

解決方案1
1 已采納 2016-09-19 13:24:01

解決方案2
0 2016-09-22 18:36:35

xpath 不能只選擇一個 html 標簽

問題描述

2 個解決方案

解決方案1 1 已采納 2016-09-19 13:24:01

解決方案2 0 2016-09-22 18:36:35

解決方案1
1 已采納 2016-09-19 13:24:01

解決方案2
0 2016-09-22 18:36:35