[英]Beautiful Soup 4 HTML parsing
I'm trying to extract coefficient table for soccer from ' http://www.flashscore.com/ '. 我正在尝试从“ http://www.flashscore.com/ ”提取足球的系数表。 As you look source code of page, you can see that table is inside a div with id="fs".
当您查看页面的源代码时,您可以看到该表位于id =“ fs”的div中。 But BeautifulSoup returns none when I search for that div.
但是,当我搜索该div时,BeautifulSoup不会返回任何内容。 I wrote script as below.
我写的脚本如下。 What is wrong here?
怎么了
Code
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.flashscore.com/")
soup = BeautifulSoup(r.content, "lxml")
print(soup.find(id="fs"))
You have to use selenium because data(div with class fs) is loaded with ajax.When request.get('http://www.flashscore.com/')
is used only 'http://www.flashscore.com/'
url is requested.No other ajax request are called that are associated with it. 您必须使用selenium,因为data(fs类为div的div)是用ajax加载的。当
request.get('http://www.flashscore.com/')
仅使用'http://www.flashscore.com/'
请求了url。没有其他与之关联的ajax请求被调用。 Refer the code below that use selenium 请参考下面使用硒的代码
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://www.flashscore.com/")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "fs"))
)
finally:
driver.quit()
i couldn't find any dive id with 'fs' on flashscore.com 我在flashscore.com上找不到带有“ fs”的任何潜水ID
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.flashscore.com/")
soup = BeautifulSoup(r.text, "html.parser")
print(soup.find('div',id='fsbody'))
soup.find() gives first occurence of id if you want to find all u can make use of find_all() function 如果您想查找所有可以使用find_all()函数的内容,soup.find()将给出id的首次出现
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.