How to extract the text H MATTHEWS from the html using Selenium and Python

Question

By using 'contains' function how can I extract information from this type of html structure, I am trying to scrape "H MATTHEWS" this information

HTML:

<p>
<strong>Date Published:</strong>
&nbsp; 20 APRIL 2020
<br>
<strong>Closing Date / Time:</strong>
&nbsp;TUESDAY, 05 MAY 2020
<br>
<strong>Enquiries:</strong>
<br>
Contact Person: H MATTHEWS
<br>
Email:&nbsp;
</p>

HTML image:

在此处输入图像描述

Answer 1

The text Contact Person: H MATTHEWS is within a text node. So to printthe text you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies :

Using XPATH and childNodes :

 print(driver.execute_script('return arguments[0].childNodes[9].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]")))).strip())

Using XPATH and splitlines() :

 print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]"))).get_attribute("innerHTML").splitlines()[-3])

Note : You have to add the following imports:

 from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

If your usecase is to extract only the text H MATTHEWS you can use either of the following solutions:

Using XPATH and childNodes :

 print(re.split('[:]', driver.execute_script('return arguments[0].childNodes[9].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]")))).strip())[1])

Using XPATH and splitlines() :

 print(re.split('[:]', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]"))).get_attribute("innerHTML").splitlines()[-3])[1])

Reference

You can find a detailed relevant discussion in:

How to extract the text H MATTHEWS from the html using Selenium and Python

Question

1 answers

solution1
0 2020-07-21 21:08:17

Reference

How to extract the text H MATTHEWS from the html using Selenium and Python

Question

1 answers

solution1 0 2020-07-21 21:08:17

Reference

solution1
0 2020-07-21 21:08:17