简体   繁体   English

Beautiful Soup 4 HTML解析

[英]Beautiful Soup 4 HTML parsing

I'm trying to extract coefficient table for soccer from ' http://www.flashscore.com/ '. 我正在尝试从“ http://www.flashscore.com/ ”提取足球的系数表。 As you look source code of page, you can see that table is inside a div with id="fs". 当您查看页面的源代码时,您可以看到该表位于id =“ fs”的div中。 But BeautifulSoup returns none when I search for that div. 但是,当我搜索该div时,BeautifulSoup不会返回任何内容。 I wrote script as below. 我写的脚本如下。 What is wrong here? 怎么了

Code
import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.flashscore.com/")
soup = BeautifulSoup(r.content, "lxml")
print(soup.find(id="fs"))

You have to use selenium because data(div with class fs) is loaded with ajax.When request.get('http://www.flashscore.com/') is used only 'http://www.flashscore.com/' url is requested.No other ajax request are called that are associated with it. 您必须使用selenium,因为data(fs类为div的div)是用ajax加载的。当request.get('http://www.flashscore.com/')仅使用'http://www.flashscore.com/'请求了url。没有其他与之关联的ajax请求被调用。 Refer the code below that use selenium 请参考下面使用硒的代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://www.flashscore.com/")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "fs"))
    )
finally:
    driver.quit()

i couldn't find any dive id with 'fs' on flashscore.com 我在flashscore.com上找不到带有“ fs”的任何潜水ID

import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.flashscore.com/")
soup = BeautifulSoup(r.text, "html.parser")
print(soup.find('div',id='fsbody'))

soup.find() gives first occurence of id if you want to find all u can make use of find_all() function 如果您想查找所有可以使用find_all()函数的内容,soup.find()将给出id的首次出现

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM