简体   繁体   中英

How to get a part of text in Selenium and web driver using Python

I want to use Selenium and Web driver to catch a part of information. I want to catch the following information:

7197409

在此处输入图像描述

The following code is their html code, I want to catch "7197409"

<script type="text/javascript">
  var messageid = 7197409;
  var highlight_id = -1;
  var authorOnly = "N";
  var ftype = 'MB';
  var adsenseFront = '<table width="99%" cellspacing="0" cellpadding="0" style="background-color: #000000; margin-left: auto; margin-right: auto;"><tr><td style="width: 100%; background-color: #F7F3F7;">';
  var adsenseEnd = '</td></tr></table>';
  var Submitted = false;
  var subject = true;
  var HiddenThreads = new Array(26); //Temp variable to save the threads temporary
  var blocked_list = Sys.Serialization.JavaScriptSerializer.deserialize('[]');
  var currentUser = undefined;
  var followList = [];
  var lock = false;
</script>

I checked their full xpath is /html/body/form/div[5]/div/div/div[2]/div[1]/script/text()

I use the following code to execute it.

from datetime import date,datetime
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import numpy as np
import xlrd
import csv
import codecs
import time

url = "https://forumd.hkgolden.com/view.aspx?type=MB&message=7197409"
driver_blank=webdriver.Chrome('./chromedriver')
driver_blank.get(url)
id=driver_blank.find_element_by_xpath("/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()")
print("ID:"+id.text)

driver_blank.close()

However, I got the following error message. They said that The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element. The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div[1]/script/text()" is: [object Text]. It should be an element.

DevTools listening on ws://127.0.0.1:50519/devtools/browser/845d0800-1dd9-4f8a-a847-7d955c8cc5e3 libpng warning: iCCP: cHRM chunk does not match sRGB [16136:16764:0411/213956.920:ERROR:ssl_client_socket_impl.cc(941)] handshake failed; returned -1, SSL error code 1, net_error -107 [16136:16764:0411/213957.351:ERROR:ssl_client_socket_impl.cc(941)] handshake failed; returned -1, SSL error code 1, net_error -107 Traceback (most recent call last): File ".\test.py", line 28, in id=driver_blank.find_element_by_xpath("/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()") File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath return self.find_element(by=By.XPATH, value=xpath) File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element 'value': value})['value'] File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Program Files\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: The result of the xpath expression "/html/body/form/div[5]/div/div/div[2]/div 1 /script/text()" is: [object Text]. It should be an element. (Session info: chrome=80.0.3987.132)

I want to ask two questions:

  1. How to solve the error?

  2. How to get only text of 7197409 in same xpath range?

Can anyone help me? Thanks

First find the script WebElement :

div = driver.find_element_by_id("ctl00_ContentPlaceHolder1_view_form")
script = div.find_element_by_tag_name('script')

Get the script InnerHTML :

text = script.get_attribute('innerHTML')
print(text)

Find the line containing "var messageid" :

line = [l for l in text.split("\n") if "var messageid" in l][0]
print("Line:", line)

Get the number from the line:

ix_1 = line.find("=")
ix_2 = line.find(";")

number = int(line[ix_1+1:ix_2])
print("Number:", number)

Out (Tested in Chromium 80.x):

Number: 7197409

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM