Python/BS4 Navigating 'div class' not working with site

Question

I am trying to scrape data from this website (I don't know if clicking just this link gets to the page I am scraping from, but this is the page I am on).

My goal is to scrape the 1x2 payout data for each game. The 1x2 payout is the first column. Before I get into the payout column, I need to first navigate to the game's row, but this is where I am running into problems. My current code is:

res = requests.get('https://www.bet365.com/#/IP/')
soup = BeautifulSoup(res.text, 'html.parser')
games = soup.find_all('div', attrs={'class':'ipo-Competition ipo-Competition-open '})
print(games)

I expected the class ipo-Competition ipo-Competition-open to get me the data of the first row, but it simply returns [] when I run it.

Is there any reason that this code is not navigating to row? Any 'div class' I try to access on this site does not appear to work so I'm wondering if it's a BS4 compatibility issue?

Thanks in advance for your help!

Answer 1

Try looking at the source of the page you're scraping (enter view-source:https://www.bet365.com/#/IP/ in your browser address bar) and you'll see that what is actually visible on the screen is not what is behind that URL.

The data you're looking for is loaded asynchronously through AJAX calls, so BeatifulSoup won't get you very far there. Instead, you'll probably need to emulate the AJAX calls and figure out which data there populates the columns you want to parse.

Python/BS4 Navigating 'div class' not working with site

Question

1 answers

solution1
1 2018-02-12 20:10:02

Python/BS4 Navigating 'div class' not working with site

Question

1 answers

solution1 1 2018-02-12 20:10:02

solution1
1 2018-02-12 20:10:02