How do I parse two elements that are stuck together?

Question

I want to get rating and numVotes from zomato.com but unfortunately it seems like the elements are stuck together. Hard to explain but I made a quick video show casing what I mean.

https://streamable.com/sdh0w

entire code: https://pastebin.com/JFKNuK2a

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/san-francisco/restaurants?q=restaurants&page=1",headers=headers)
content = response.content
bs = BeautifulSoup(content,"html.parser")

zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})


for zomato_container in zomato_containers:
    rating = zomato_container.find('div', {'class': 'search_result_rating'})
    # numVotes = zomato_container.find("div", {"class": "rating-votes-div"})

    print("rating: ", rating.get_text().strip())
    # print("numVotes: ", numVotes.text())

Answer 1

You can use re module to parse the voting count:

import re
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/san-francisco/restaurants?q=restaurants&page=1",headers=headers)
content = response.content
bs = BeautifulSoup(content,"html.parser")

zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})

for zomato_container in zomato_containers:
    print('name:', zomato_container.select_one('.result-title').get_text(strip=True))
    print('rating:', zomato_container.select_one('.rating-popup').get_text(strip=True))
    votes = ''.join( re.findall(r'\d', zomato_container.select_one('[class^="rating-votes"]').text) )
    print('votes:', votes)
    print('*' * 80)

Prints:

name: The Original Ghirardelli Ice Cream and Chocolate...
rating: 4.9
votes: 344
********************************************************************************
name: Tadich Grill
rating: 4.6
votes: 430
********************************************************************************
name: Delfina
rating: 4.8
votes: 718
********************************************************************************

...and so on.

OR:

If you don't want to use re , you can use str.split() :

votes = zomato_container.select_one('[class^="rating-votes"]').get_text(strip=True).split()[0]

Answer 2

According to requirements in your clip you should alter you selectors to be more specific so as to target the appropriate child elements (rather than parent). At present, by targeting parent you are getting the unwanted extra child. To get the appropriate ratings element you can use a css attribute = value with starts with operator.

This

[class^=rating-votes-div]

says match on elements with class attribute whose values starts with rating-votes-div

Visual:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/san-francisco/restaurants?q=restaurants&page=1",headers=headers)
content = response.content
bs = BeautifulSoup(content,"html.parser")

zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})


for zomato_container in zomato_containers:
    name = zomato_container.select_one('.result-title').text.strip()
    rating = zomato_container.select_one('.rating-popup').text.strip()
    numVotes = zomato_container.select_one('[class^=rating-votes-div]').text 
    print('name: ', name)
    print('rating: ' , rating)
    print('votes: ', numVotes)

How do I parse two elements that are stuck together?

Question

2 answers

solution1
0 ACCPTED 2019-08-06 07:46:50

solution2
0 2019-08-06 11:07:20

How do I parse two elements that are stuck together?

Question

2 answers

solution1 0 ACCPTED 2019-08-06 07:46:50

solution2 0 2019-08-06 11:07:20

solution1
0 ACCPTED 2019-08-06 07:46:50

solution2
0 2019-08-06 11:07:20