I am working on a web scraping project, where I try to scrape information from the amazon website. In the website, there exists an unordered list with such information
Item Weight: 17.2 pounds
Shipping Weight: 17.4 pounds (View shipping rates and policies)
ASIN: B00HC767P6
UPC: 766789717088 885720483186 052000201628
Item model number: mark-1hooi-toop842
Customer Reviews: 4.8 out of 5 stars1,352 customer ratings
Amazon Best Sellers Rank: #514 in Grocery & Gourmet Food (See Top 100 in Grocery & Gourmet Food)
#12 in Sports Drinks
The list itself does not have any class to it. The problem is I do not want all the information from the list. Only the ASIN code. the li
tags do not have any specific class or ID to them. here is the link to the product details page
Before selenium, I was working with BeautifulSoup and this is how I tackled the issue
asin = str(soup.find('bdi', {'dir': 'ltr'}).find_parent('li'))[38:].split('<')[0]
I am now switching to selenium. How do I scrape the information.
You can use the css selector to get the related li item as follow:
$(".content > ul > li:nth-child(2)").textContent >>> "Shipping Weight: 33 pounds (View shipping rates and policies)"
$(".content > ul > li:nth-child(3)").textContent >>> "ASIN: B07QKN2ZT9"
related python selenium code:
driver.find_element_by_css_selector(".content > ul > li:nth-child(3)").text.split(": ")[1] >>> 'B07QKN2ZT9'
If the ASIN is not always in the same index, then you can find the bdi
element that has text ASIN
text and find its ancestor::li
then get its text and extract the related part. Like the following:
driver.find_element_by_xpath("//bdi[text()='ASIN']/ancestor::li").text.split(": ")[1] >>> 'B07QKN2ZT9'
//<element type>[<attribute type> = <attribute value>]/<descendant>
//bdi[text() = 'ASIN'] >>> bdi element with text 'ASIN'
//bdi[@dir = 'ltr'] >>> bdi element with dir attribute equals to 'ltr'
/ancestor::<ancestor element type>
//bdi[text()='ASIN']/ancestor::li >>> li
//bdi[text()='ASIN']/ancestor::ul >>> ul
You can check this as a reference
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.