How to scrape information inside an unordered list selenium + python

Question

I am working on a web scraping project, where I try to scrape information from the amazon website. In the website, there exists an unordered list with such information

Item Weight: 17.2 pounds
Shipping Weight: 17.4 pounds (View shipping rates and policies)
ASIN: B00HC767P6
UPC: 766789717088 885720483186 052000201628
Item model number: mark-1hooi-toop842
Customer Reviews: 4.8 out of 5 stars1,352 customer ratings
Amazon Best Sellers Rank: #514 in Grocery & Gourmet Food (See Top 100 in Grocery & Gourmet Food)
#12 in Sports Drinks

The list itself does not have any class to it. The problem is I do not want all the information from the list. Only the ASIN code. the li tags do not have any specific class or ID to them. here is the link to the product details page

Before selenium, I was working with BeautifulSoup and this is how I tackled the issue

asin = str(soup.find('bdi', {'dir': 'ltr'}).find_parent('li'))[38:].split('<')[0]

I am now switching to selenium. How do I scrape the information.

Answer 1

You can use the css selector to get the related li item as follow:

Finding the child element by index by css selector

$(".content > ul > li:nth-child(2)").textContent >>> "Shipping Weight: 33 pounds (View shipping rates and policies)"
$(".content > ul > li:nth-child(3)").textContent >>> "ASIN: B07QKN2ZT9"

related python selenium code:

driver.find_element_by_css_selector(".content > ul > li:nth-child(3)").text.split(": ")[1] >>> 'B07QKN2ZT9'

Finding the ancestors element by XPATH

If the ASIN is not always in the same index, then you can find the bdi element that has text ASIN text and find its ancestor::li then get its text and extract the related part. Like the following:

driver.find_element_by_xpath("//bdi[text()='ASIN']/ancestor::li").text.split(": ")[1] >>> 'B07QKN2ZT9'

Generate XPATH

//<element type>[<attribute type> = <attribute value>]/<descendant>
//bdi[text() = 'ASIN'] >>> bdi element with text 'ASIN'
//bdi[@dir = 'ltr'] >>> bdi element with dir attribute equals to 'ltr'

Access to an ancestor of an element

/ancestor::<ancestor element type>
//bdi[text()='ASIN']/ancestor::li >>> li
//bdi[text()='ASIN']/ancestor::ul >>> ul

You can check this as a reference

How to scrape information inside an unordered list selenium + python

Question

1 answers

solution1
1 ACCPTED 2020-06-02 11:12:19

Finding the child element by index by css selector

Finding the ancestors element by XPATH

Generate XPATH

Access to an ancestor of an element

How to scrape information inside an unordered list selenium + python

Question

1 answers

solution1 1 ACCPTED 2020-06-02 11:12:19

Finding the child element by index by css selector

Finding the ancestors element by XPATH

Generate XPATH

Access to an ancestor of an element

solution1
1 ACCPTED 2020-06-02 11:12:19