简体   繁体   English

Python selenium 不会打印整页

[英]Python selenium wont print entire page

I am trying to get the html code from a web page but I only get like 1/4 of the page showing.我正在尝试从 web 页面获取 html 代码,但我只显示了页面的 1/4。

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.hltv.org/matches")

print(driver.page_source)

It feels like I have tried everything but still get the same result.感觉就像我已经尝试了一切但仍然得到相同的结果。 It doesn't start at the top.它不是从顶部开始的。 It starts far far down, almost at the end.它从很远很远的地方开始,几乎在尽头。

Anyone got a clue?有人知道吗?

Try the below code.试试下面的代码。 this worked for me这对我有用

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.hltv.org/matches")
file = open("asd.html", "a", encoding='utf8')
file.write(driver.page_source)
file.close()

It could be because your get has not finished loading the page at the time that your printing is happening.这可能是因为您的 get 在打印时尚未完成页面加载。

To fix this you could try waiting for a known element to load before printing.要解决此问题,您可以尝试在打印前等待已知元素加载。

To wait for an element ("backToLoginDialog" in the example below) to load, adjust your code to be like the following:要等待元素(下例中的“backToLoginDialog”)加载,请将代码调整为如下所示:

from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

# set up driver and page load timeout
driver = webdriver.Chrome()
timeout = 5

# create your "wait" function
def wait_for_load(element_id):
    element_present = EC.presence_of_element_located((By.ID, element_id))
    WebDriverWait(driver, timeout).until(element_present)

driver.get('https://www.hltv.org/matches')
wait_for_load('backToLoginDialog')
print(driver.page_source)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM