简体   繁体   English

Python Selenium driver.find_element().text 返回空字符串,但文本在 driver.page_source 中可见

[英]Python Selenium driver.find_element().text returns empty string, but text is visible in the driver.page_source

I'm trying to scrape some titles of the videos and to do so I'm using Selenium, but I've encountered a problem.我正在尝试抓取视频的一些标题并为此使用 Selenium,但我遇到了一个问题。 driver.find_element().text returns empty string, but title is for sure located in given XPATH. Here is the fragment of the page source returned by driver.page_source: driver.find_element().text返回空字符串,但标题肯定位于给定的 XPATH 中。这是 driver.page_source 返回的页面源代码片段:

<div class="title"><a href="/f/4n3x7e31hpwxm8"target="_blank">Big.Sky.S03E01.ITA.WEBDL.1080p</a></div>

To find the title I am trying to use:要查找我尝试使用的标题:

hoverable = driver.find_element(By.XPATH, '//*[@id="videojs"]/div[1]')
ActionChains(driver).move_to_element(hoverable).perform()
wait = WebDriverWait(driver, 20)
title_from_url = wait.until(EC.visibility_of_element_located((By.XPATH, '/html/body/div[1]/a'))).text
title_from_url = driver.find_element(
    By.XPATH, '/html/body/div[1]/a'
    ).text.casefold()

From what I've read it could be caused by the fact that the page might not be fully loaded (I wasn't using any wait condition here).从我读过的内容来看,这可能是由于页面可能没有完全加载(我在这里没有使用任何等待条件)。 After that I've tried to add a wait condition and even time.sleep(), but it didn't change anything.之后我尝试添加等待条件甚至 time.sleep(),但它并没有改变任何东西。 <mini question: how would proper wait staitment look like here?> <迷你问题:适当的等待状态在这里看起来如何?>

Edit: I think the problem is caused, because title is showing up only when mouse is in the player area.编辑:我认为问题是造成的,因为只有当鼠标在播放器区域时标题才会显示。 I think some mouse movement will be needed here, but I have tried to move mouse into the player area and for some time it is working but after a while there is a moment when title will disappear too fast.我认为这里需要一些鼠标移动,但我尝试将鼠标移动到播放器区域并且有一段时间它可以工作但过了一会儿标题会消失得太快。 Is there a way to use find_element() while also moving mouse?有没有办法在移动鼠标的同时使用 find_element() ?

Any help will be appreciated.任何帮助将不胜感激。 Best regards, Ed.最好的问候,埃德。

Example site: https://mixdrop.to/e/4n3x7e31hpwxm8 .示例站点: https://mixdrop.to/e/4n3x7e31hpwxm8

You have to wait for element to be completely loaded before extracting it text content.在提取文本内容之前,您必须等待元素完全加载。 WebDriverWait expected_conditions explicit waits should be used for that.为此应使用WebDriverWait expected_conditions显式等待。
This should wait in case the element is visible on the page and the locator is correct:如果元素在页面上可见并且定位器正确,这应该等待:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 20)

title_from_url = wait.until(EC.visibility_of_element_located((By.XPATH, '//div[contains(@class, "title")]/a'))).text

UPD更新程序
In case the element content is dynamically changes and we need to hover over that element to make the desired text to appear there, we can simulate the mouse hover action with the help of ActionChains module.如果元素内容动态变化,我们需要在该元素上 hover 以使所需的文本出现在那里,我们可以在ActionChains模块的帮助下模拟鼠标 hover 动作。
The tool-tip etc. will not disappear until you perform some click etc. on that page.工具提示等不会消失,直到您在该页面上执行一些点击等操作。 So, just performing a background action of driver.find_element() will not affect that text.因此,仅执行driver.find_element()的后台操作不会影响该文本。
UPD2 UPD2
The title element is not visible.标题元素不可见。 It becomes visible only by hovering over the player.只有将鼠标悬停在玩家上方时,它才会变得可见。
Here I'm performing such hovering and then getting the title text:在这里我执行这样的悬停然后获取标题文本:

from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 5)
actions = ActionChains(driver)

url = "https://mixdrop.to/e/4n3x7e31hpwxm8"
driver.get(url)

player = wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'player')))
actions.move_to_element(player).perform()
title = wait.until(EC.presence_of_element_located((By.XPATH, '//div[contains(@class, "title")]/a')))
print(title.text)

The output is: output 是:

Big.Sky.S03E01.ITA.WEBDL.1080p

If you suspect its because of sync issue.如果您怀疑是因为同步问题。 You can use selenium waits.Let it be implicit of explicit.您可以使用 selenium 等待。让它隐式或显式。

Implicit: objdriver.implicitely_wait(float)隐式:objdriver.implicitely_wait(float)

Explicit:显式:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

objwait=WebDriverWait(driver,float,poll_frequency=float,ignored_exception=float) objwait=WebDriverWait(driver,float,poll_frequency=float,ignored_exception=float)

objelement=objwait.until(EC.visibility_of_element_located((By.XPATH,"Your XPATH"))) objelement=objwait.until(EC.visibility_of_element_located((By.XPATH,"你的 XPATH")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM