BeautifulSoup can't find all tags

Question

My goal is to get number of specific tags from links what I want to scrape. I have inspected manually the number of the tags and my code can't find all of them.

I've tried different parsers like "html.parser", "html5lib" and "lxml" but the fault occurs everytime.

My code:

from bs4 import BeautifulSoup
from selenium import webdriver
urls = ["http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502491&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502451&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502395&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502407&season_id=93783&league_id=4#mbt:2-400$t&0=1"]

for url in urls:
    browser = webdriver.PhantomJS()
    browser.get(url)
    table = BeautifulSoup(browser.page_source, 'lxml')
    print(len(table.find_all("tr", {"class":["row1","row2"]})))

Output:

Goal output:

Answer 1

I basically just added a delay line to your code. This help the program waits until the webpage is totally loaded and ready for parsing using BS4.

Also note that my output is different than your goal output. But I double checked the number of "tr" that contains "row1" and "row2" on each url and it seems that my output is accurate (perhaps the results on the website changed a bit after the time you posted the question).

Code:

import time
from bs4 import BeautifulSoup
from selenium import webdriver

urls = ["http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502491&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502451&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502395&season_id=93783&league_id=4#mbt:2-400$t&0=1",
"http://www.basket.fi/sarjat/ottelu/?game_id=3502407&season_id=93783&league_id=4#mbt:2-400$t&0=1"]

for url in urls:
    driver = webdriver.Chrome()
    driver.get(url)
    time.sleep(10)
    table = BeautifulSoup(driver.page_source, 'lxml')
    print(len(table.find_all("tr", {"class":["row1","row2"]})))

Output:

BeautifulSoup can't find all tags

Question

1 answers

solution1
1 ACCPTED 2017-09-26 18:27:31

BeautifulSoup can't find all tags

Question

1 answers

solution1 1 ACCPTED 2017-09-26 18:27:31

solution1
1 ACCPTED 2017-09-26 18:27:31