[英]How to get a part of text in Selenium and web driver using Python
I want to use Selenium and Web driver to catch a part of information.我想使用 Selenium 和 Web 驱动来捕获一部分信息。 I want to catch the following information:
我想获取以下信息:
7197409
The following code is their html code, I want to catch "7197409"下面的代码是他们的html代码,我要抓“7197409”
<script type="text/javascript">
var messageid = 7197409;
var highlight_id = -1;
var authorOnly = "N";
var ftype = 'MB';
var adsenseFront = '<table width="99%" cellspacing="0" cellpadding="0" style="background-color: #000000; margin-left: auto; margin-right: auto;"><tr><td style="width: 100%; background-color: #F7F3F7;">';
var adsenseEnd = '</td></tr></table>';
var Submitted = false;
var subject = true;
var HiddenThreads = new Array(26); //Temp variable to save the threads temporary
var blocked_list = Sys.Serialization.JavaScriptSerializer.deserialize('[]');
var currentUser = undefined;
var followList = [];
var lock = false;
</script>
I checked their full xpath is /html/body/form/div[5]/div/div/div[2]/div[1]/script/text()
我检查了他们的完整 xpath 是
/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()
I use the following code to execute it.我使用以下代码来执行它。
from datetime import date,datetime
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import numpy as np
import xlrd
import csv
import codecs
import time
url = "https://forumd.hkgolden.com/view.aspx?type=MB&message=7197409"
driver_blank=webdriver.Chrome('./chromedriver')
driver_blank.get(url)
id=driver_blank.find_element_by_xpath("/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()")
print("ID:"+id.text)
driver_blank.close()
However, I got the following error message.但是,我收到以下错误消息。 They said that
The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element.
他们说
The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element.
The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element.
DevTools listening on ws://127.0.0.1:50519/devtools/browser/845d0800-1dd9-4f8a-a847-7d955c8cc5e3 libpng warning: iCCP: cHRM chunk does not match sRGB [16136:16764:0411/213956.920:ERROR:ssl_client_socket_impl.cc(941)] handshake failed;
DevTools 监听 ws://127.0.0.1:50519/devtools/browser/845d0800-1dd9-4f8a-a847-7d955c8cc5e3 libpng 警告:iCCP:cHRM 块不匹配 sRGB [16136:16764:0411/213956.920:ERROR:ssl_client_socket_impl。 cc(941)] 握手失败; returned -1, SSL error code 1, net_error -107 [16136:16764:0411/213957.351:ERROR:ssl_client_socket_impl.cc(941)] handshake failed;
返回 -1,SSL 错误代码 1,net_error -107 [16136:16764:0411/213957.351:ERROR:ssl_client_socket_impl.cc(941)] 握手失败; returned -1, SSL error code 1, net_error -107 Traceback (most recent call last): File ".\test.py", line 28, in id=driver_blank.find_element_by_xpath("/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()") File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath return self.find_element(by=By.XPATH, value=xpath) File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element 'value': value})['value'] File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSelectorException: M
返回 -1,SSL 错误代码 1,net_error -107 Traceback(最近一次调用最后):文件“.\test.py”,第 28 行,在 id=driver_blank.find_element_by_xpath("/html/body/form/div[5 ]/div/div/div[2]/div 1 /script/text()") 文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 394 行, 在 find_element_by_xpath 返回 self.find_element(by=By.XPATH, value=xpath) 文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 978 行,在 find_element 'value': value})['value'] 文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 321 行,在执行 self.error_handler.check_response(响应)文件“C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py”,第 242 行,在 check_response 中引发异常类(消息、屏幕、堆栈跟踪)selenium.common.exceptions.InvalidSelectorException : 米essage: invalid selector: The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()" is: [object Text].
essage:无效选择器:xpath 表达式“/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()”的结果是:[object Text]。 It should be an element.
它应该是一个元素。 (Session info: chrome=80.0.3987.132)
(会话信息:chrome=80.0.3987.132)
I want to ask two questions:我想问两个问题:
How to solve the error?如何解决错误?
How to get only text of 7197409 in same xpath range?如何在同一 xpath 范围内仅获取 7197409 的文本?
Can anyone help me?谁能帮我? Thanks
谢谢
First find the script WebElement :首先找到脚本WebElement :
div = driver.find_element_by_id("ctl00_ContentPlaceHolder1_view_form")
script = div.find_element_by_tag_name('script')
Get the script InnerHTML :获取脚本InnerHTML :
text = script.get_attribute('innerHTML')
print(text)
Find the line containing "var messageid" :找到包含"var messageid"的行:
line = [l for l in text.split("\n") if "var messageid" in l][0]
print("Line:", line)
Get the number from the line:从行中获取号码:
ix_1 = line.find("=")
ix_2 = line.find(";")
number = int(line[ix_1+1:ix_2])
print("Number:", number)
Out (Tested in Chromium 80.x):输出(在 Chromium 80.x 中测试):
Number: 7197409
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.