class Me2Spider(scrapy.Spider):
name = 'me'
allowed_domains = ['www.amazon.com']
start_urls = [
'https://www.amazon.com/dp/B08DL5SQDM?th=1',
'https://www.amazon.com/dp/B08DL6D52S?th=1',
'https://www.amazon.com/dp/B01LW14DG7?th=1'
]
def parse(self, response):
yield{
'ASIN': response.xpath('//div[@class="a-section table-padding"]/table[@id="productDetails_detailBullets_sections1"]/tbody/tr[1]/td').get(),
'Ranking': response.xpath('//*[@id="prodDetails"]/div/div[2]/div[2]/div/div[1]/span[3]/text()').get(),
}
I've scraped like this before but now the data is not coming.
The problem is in the xpath. That is why you are getting a None
element, because the program is not looking for the right element.
If you look at the markup for the amazon page, you can see that the ASIN
is inside a table
. specifically it is like this
<table id="productDetails_detailBullets_sections1" class="a-keyvalue prodDetTable" role="presentation">
<tbody>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
ASIN
</th>
<td class="a-size-base">
B08DL5SQDM
</td>
</tr>
So you can access the ASIN
number by finding the th
tag with the text ASIN
and looking for the td
after the th
element.
try this code
url = "https://www.amazon.com/dp/B08DL6D52S?th=1"
driver.get(url)
path = "//th[normalize-space() = 'ASIN']//following-sibling::td"
element = driver.find_element_by_xpath(path)
print(element.text)
according to mozilla , normaize-space
is defined as
The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.