為什么亞馬遜暢銷書排名和ASIN數據沒有來？

Question

class Me2Spider(scrapy.Spider):
    name = 'me'
    allowed_domains = ['www.amazon.com']
    start_urls = [
        'https://www.amazon.com/dp/B08DL5SQDM?th=1',
        'https://www.amazon.com/dp/B08DL6D52S?th=1',
        'https://www.amazon.com/dp/B01LW14DG7?th=1'
        ]

    def parse(self, response):
        yield{
            'ASIN': response.xpath('//div[@class="a-section table-padding"]/table[@id="productDetails_detailBullets_sections1"]/tbody/tr[1]/td').get(),
            'Ranking': response.xpath('//*[@id="prodDetails"]/div/div[2]/div[2]/div/div[1]/span[3]/text()').get(),
        }

我以前這樣刮過，但現在數據不來了。

Answer 1

問題出在 xpath 中。 這就是為什么你會得到一個None元素，因為程序沒有在尋找正確的元素。

如果您查看亞馬遜頁面的標記，您可以看到ASIN位於table 。 具體是這樣的

<table id="productDetails_detailBullets_sections1" class="a-keyvalue prodDetTable" role="presentation">
    <tbody>
        <tr>
            <th class="a-color-secondary a-size-base prodDetSectionEntry">
                ASIN
            </th>
            <td class="a-size-base">
                B08DL5SQDM
            </td>
        </tr>

因此，您可以通過查找帶有文本ASIN的th標簽並查找th元素后的td來訪問ASIN編號。

試試這個代碼

url = "https://www.amazon.com/dp/B08DL6D52S?th=1"
driver.get(url)

path = "//th[normalize-space() = 'ASIN']//following-sibling::td"
element = driver.find_element_by_xpath(path)
print(element.text)

根據mozilla ， normaize-space被定義為

normalize-space 函數從字符串中去除前導和尾隨空格，用單個空格替換空格字符序列，並返回結果字符串。

為什么亞馬遜暢銷書排名和ASIN數據沒有來？

問題描述

1 個解決方案

解決方案1
0 2020-09-29 04:22:43

為什么亞馬遜暢銷書排名和ASIN數據沒有來？

問題描述

1 個解決方案

解決方案1 0 2020-09-29 04:22:43

解決方案1
0 2020-09-29 04:22:43