简体   繁体   中英

Beautiful Soup 4 HTML parsing

I'm trying to extract coefficient table for soccer from ' http://www.flashscore.com/ '. As you look source code of page, you can see that table is inside a div with id="fs". But BeautifulSoup returns none when I search for that div. I wrote script as below. What is wrong here?

Code
import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.flashscore.com/")
soup = BeautifulSoup(r.content, "lxml")
print(soup.find(id="fs"))

You have to use selenium because data(div with class fs) is loaded with ajax.When request.get('http://www.flashscore.com/') is used only 'http://www.flashscore.com/' url is requested.No other ajax request are called that are associated with it. Refer the code below that use selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://www.flashscore.com/")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "fs"))
    )
finally:
    driver.quit()

i couldn't find any dive id with 'fs' on flashscore.com

import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.flashscore.com/")
soup = BeautifulSoup(r.text, "html.parser")
print(soup.find('div',id='fsbody'))

soup.find() gives first occurence of id if you want to find all u can make use of find_all() function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM