简体   繁体   English

如何使用 Python 驱动程序在 Selenium 和 web 驱动程序中获取部分文本

[英]How to get a part of text in Selenium and web driver using Python

I want to use Selenium and Web driver to catch a part of information.我想使用 Selenium 和 Web 驱动来捕获一部分信息。 I want to catch the following information:我想获取以下信息:

7197409

在此处输入图像描述

The following code is their html code, I want to catch "7197409"下面的代码是他们的html代码,我要抓“7197409”

<script type="text/javascript">
  var messageid = 7197409;
  var highlight_id = -1;
  var authorOnly = "N";
  var ftype = 'MB';
  var adsenseFront = '<table width="99%" cellspacing="0" cellpadding="0" style="background-color: #000000; margin-left: auto; margin-right: auto;"><tr><td style="width: 100%; background-color: #F7F3F7;">';
  var adsenseEnd = '</td></tr></table>';
  var Submitted = false;
  var subject = true;
  var HiddenThreads = new Array(26); //Temp variable to save the threads temporary
  var blocked_list = Sys.Serialization.JavaScriptSerializer.deserialize('[]');
  var currentUser = undefined;
  var followList = [];
  var lock = false;
</script>

I checked their full xpath is /html/body/form/div[5]/div/div/div[2]/div[1]/script/text()我检查了他们的完整 xpath 是/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()

I use the following code to execute it.我使用以下代码来执行它。

from datetime import date,datetime
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import numpy as np
import xlrd
import csv
import codecs
import time

url = "https://forumd.hkgolden.com/view.aspx?type=MB&message=7197409"
driver_blank=webdriver.Chrome('./chromedriver')
driver_blank.get(url)
id=driver_blank.find_element_by_xpath("/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()")
print("ID:"+id.text)

driver_blank.close()

However, I got the following error message.但是,我收到以下错误消息。 They said that The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element.他们说The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element. The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element.

DevTools listening on ws://127.0.0.1:50519/devtools/browser/845d0800-1dd9-4f8a-a847-7d955c8cc5e3 libpng warning: iCCP: cHRM chunk does not match sRGB [16136:16764:0411/213956.920:ERROR:ssl_client_socket_impl.cc(941)] handshake failed; DevTools 监听 ws://127.0.0.1:50519/devtools/browser/845d0800-1dd9-4f8a-a847-7d955c8cc5e3 libpng 警告:iCCP:cHRM 块不匹配 sRGB [16136:16764:0411/213956.920:ERROR:ssl_client_socket_impl。 cc(941)] 握手失败; returned -1, SSL error code 1, net_error -107 [16136:16764:0411/213957.351:ERROR:ssl_client_socket_impl.cc(941)] handshake failed;返回 -1,SSL 错误代码 1,net_error -107 [16136:16764:0411/213957.351:ERROR:ssl_client_socket_impl.cc(941)] 握手失败; returned -1, SSL error code 1, net_error -107 Traceback (most recent call last): File ".\test.py", line 28, in id=driver_blank.find_element_by_xpath("/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()") File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath return self.find_element(by=By.XPATH, value=xpath) File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element 'value': value})['value'] File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSelectorException: M返回 -1,SSL 错误代码 1,net_error -107 Traceback(最近一次调用最后):文件“.\test.py”,第 28 行,在 id=driver_blank.find_element_by_xpath("/html/body/form/div[5 ]/div/div/div[2]/div 1 /script/text()") 文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 394 行, 在 find_element_by_xpath 返回 self.find_element(by=By.XPATH, value=xpath) 文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 978 行,在 find_element 'value': value})['value'] 文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 321 行,在执行 self.error_handler.check_response(响应)文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py”,第 242 行,在 check_response 中引发异常类(消息、屏幕、堆栈跟踪)selenium.common.exceptions.InvalidSelectorException : 米essage: invalid selector: The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()" is: [object Text]. essage:无效选择器:xpath 表达式“/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()”的结果是:[object Text]。 It should be an element.它应该是一个元素。 (Session info: chrome=80.0.3987.132) (会话信息:chrome=80.0.3987.132)

I want to ask two questions:我想问两个问题:

  1. How to solve the error?如何解决错误?

  2. How to get only text of 7197409 in same xpath range?如何在同一 xpath 范围内仅获取 7197409 的文本?

Can anyone help me?谁能帮我? Thanks谢谢

First find the script WebElement :首先找到脚本WebElement

div = driver.find_element_by_id("ctl00_ContentPlaceHolder1_view_form")
script = div.find_element_by_tag_name('script')

Get the script InnerHTML :获取脚本InnerHTML

text = script.get_attribute('innerHTML')
print(text)

Find the line containing "var messageid" :找到包含"var messageid"的行:

line = [l for l in text.split("\n") if "var messageid" in l][0]
print("Line:", line)

Get the number from the line:从行中获取号码:

ix_1 = line.find("=")
ix_2 = line.find(";")

number = int(line[ix_1+1:ix_2])
print("Number:", number)

Out (Tested in Chromium 80.x):输出(在 Chromium 80.x 中测试):

Number: 7197409

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Selenium Web驱动程序从网站获取隐藏文本值的Javascript - Javascript to get hidden text value from website using Selenium web driver 如何使用 nodejs 的 selenium web-driver 获取子元素的数量 - how to get count of child elements using selenium web-driver for nodejs 如何在不知道锚标记内的文本的情况下在 python 中使用 selenium 驱动程序单击链接 - how to click a link using selenium driver in python while don't know the text inside anchor tag 如何在 Python 中使用 Selenium 从网页中获取数据? - How to get data from a web-page using Selenium in Python? 如何使用Selenium Web驱动程序在代码镜像中输入值? - How to input value in code mirror using selenium web driver? 如何使用Selenium Web驱动程序单击导航栏 - How to click on navigation bar using selenium web driver 如何使用硒Python在画布后面获取文本 - How to get text behind canvas using selenium python 如何使用 Selenium 和 Python 从由空格分隔的文本节点获取文本 - How to get text from textnodes seperated by whitespace using Selenium and Python 使用 Selenium Web 驱动程序 C# 无法在文本区域中输入文本 - Not able to enter text in the text area using Selenium Web Driver C# 在 javascript 中使用 Selenium 驱动程序获取 WebElement 文本内容 - Get WebElement text content with Selenium Driver in javascript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM