[英]How to get text from a section of a website using Selenium in Python 3
I was wondering how I can pull text from a website using Selenium and Python 3. I don't know what the text is, so I can't just look for the sentence and copy it.我想知道如何使用 Selenium 和 Python 3 从网站中提取文本。我不知道文本是什么,所以我不能只查找句子并复制它。 Here is an example screenshot: Example Problem.这是一个示例屏幕截图:示例问题。 Know in this scenario I am looking for the small amount of text right after the 1. but it is represented by just ::header, so I am having trouble grabbing it.知道在这种情况下,我正在寻找 1 之后的少量文本。但它仅由 ::header 表示,所以我很难抓住它。 Any ideas?有任何想法吗? Thanks!谢谢! Also the website I am pulling from is Quia.我从中提取的网站也是Quia。
Thanks!谢谢!
It's hard to answer directly because this web example is behind login.很难直接回答,因为这个 Web 示例在登录之后。 Broadly speaking you may use xpath expressions which needs information about xml/html tree(In example available under F12 button on PC keyboard when using Chrome or Firefox. „Inspect” from contex mouse menu is also the way).一般来说,您可以使用需要有关 xml/html 树的信息的 xpath 表达式(例如,在使用 Chrome 或 Firefox 时,PC 键盘上的 F12 按钮下可用。从上下文鼠标菜单中“检查”也是一种方式)。 Example on login page of same server to get welcome text:在同一服务器的登录页面上获取欢迎文本的示例:
from selenium import webdriver
from selenium.webdriver.common.by import By
def s_obj(sel_drv, xph):
return sel_drv.find_elements(by=By.XPATH, value = f"{xph}")
def s_text(sel_drv, xph):
els = s_obj(sel_drv, xph)
return '; '.join(el.text.replace('\n', '; ')\
for el in els).strip(';').strip() if els else ''
test_url = "https://www.quia.com/web"
sel_drv = webdriver.Chrome()
sel_drv.get(test_url)
bs_xph = "//*/table/tbody/tr/td[@colspan=\"5\"]/h1[@class=\"home\"]"
expected_txt = s_text(sel_drv, f"{bs_xph}[1]")
print(expected_txt)
sel_drv.quit()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.