为什么这段代码中http-response的html文件不完整？

Question

I'm trying to get some data from a website ( https://www.evaschulze-aufgabenpool.de/index.php/s/smwP6ygck2SXRtF ) by using python and the modules "requests" and "BeautifulSoup" but it seems like I get an incomplete html file as a response.我正在尝试通过使用 python 和模块“请求”和“BeautifulSoup”从网站（ https://www.evaschulze-aufgabenpool.de/index.php/s/smwP6ygck2SXRtF ）获取一些数据得到一个不完整的 html 文件作为响应。 Eg the table tag inside the html file I get as response with my code lacks of lines compared to the original html file when inspecting it with my browser.例如，使用浏览器检查时，与原始 html 文件相比，我得到的响应是 html 文件中的表格标签缺少行。 So my Question is: What is the reason for this and how can I solve this problem?所以我的问题是：这是什么原因，我该如何解决这个问题？

Here's the code I used to get the data inside the table tag:这是我用来获取表格标签内数据的代码：

import requests
from bs4 import BeautifulSoup

source = requests.get("https://www.evaschulze-aufgabenpool.de/index.php/s/smwP6ygck2SXRtF").text
soup = BeautifulSoup(source, "html.parser")
for table in soup.find_all("table"):
    print(table)

Answer 1

What happens?怎么了？

Content of table is generated dynamically and is not included in the response of your request.表格内容是动态生成的，不包含在您的请求响应中。 You have to wait until page/content is loaded.您必须等到页面/内容加载完毕。

What you can do is go with selenium你可以做的是 go 和 selenium

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

url = "https://www.evaschulze-aufgabenpool.de/index.php/s/smwP6ygck2SXRtF"

driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')

driver.get(url)
#driver.implicitly_wait(10) 
sleep(3)
soup = BeautifulSoup(driver.page_source,"lxml")

for table in soup.find_all("table"):
    print(table)

driver.close()

为什么这段代码中http-response的html文件不完整？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-12-26 16:18:40

为什么这段代码中http-response的html文件不完整？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-12-26 16:18:40

解决方案1
0 已采纳 2020-12-26 16:18:40