[英]why amazon best seller rank and ASIN data is not coming?
class Me2Spider(scrapy.Spider):
name = 'me'
allowed_domains = ['www.amazon.com']
start_urls = [
'https://www.amazon.com/dp/B08DL5SQDM?th=1',
'https://www.amazon.com/dp/B08DL6D52S?th=1',
'https://www.amazon.com/dp/B01LW14DG7?th=1'
]
def parse(self, response):
yield{
'ASIN': response.xpath('//div[@class="a-section table-padding"]/table[@id="productDetails_detailBullets_sections1"]/tbody/tr[1]/td').get(),
'Ranking': response.xpath('//*[@id="prodDetails"]/div/div[2]/div[2]/div/div[1]/span[3]/text()').get(),
}
I've scraped like this before but now the data is not coming.我以前这样刮过,但现在数据不来了。
The problem is in the xpath.问题出在 xpath 中。 That is why you are getting a
None
element, because the program is not looking for the right element.这就是为什么你会得到一个
None
元素,因为程序没有在寻找正确的元素。
If you look at the markup for the amazon page, you can see that the ASIN
is inside a table
.如果您查看亚马逊页面的标记,您可以看到
ASIN
位于table
。 specifically it is like this具体是这样的
<table id="productDetails_detailBullets_sections1" class="a-keyvalue prodDetTable" role="presentation">
<tbody>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
ASIN
</th>
<td class="a-size-base">
B08DL5SQDM
</td>
</tr>
So you can access the ASIN
number by finding the th
tag with the text ASIN
and looking for the td
after the th
element.因此,您可以通过查找带有文本
ASIN
的th
标签并查找th
元素后的td
来访问ASIN
编号。
try this code试试这个代码
url = "https://www.amazon.com/dp/B08DL6D52S?th=1"
driver.get(url)
path = "//th[normalize-space() = 'ASIN']//following-sibling::td"
element = driver.find_element_by_xpath(path)
print(element.text)
according to mozilla , normaize-space
is defined as根据mozilla ,
normaize-space
被定义为
The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string.
normalize-space 函数从字符串中去除前导和尾随空格,用单个空格替换空格字符序列,并返回结果字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.