By using 'contains' function how can I extract information from this type of html structure, I am trying to scrape "H MATTHEWS" this information
HTML:
<p>
<strong>Date Published:</strong>
20 APRIL 2020
<br>
<strong>Closing Date / Time:</strong>
TUESDAY, 05 MAY 2020
<br>
<strong>Enquiries:</strong>
<br>
Contact Person: H MATTHEWS
<br>
Email:
</p>
HTML image:
The text Contact Person: H MATTHEWS is within a text node. So to printthe text you have to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies :
Using XPATH
and childNodes :
print(driver.execute_script('return arguments[0].childNodes[9].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]")))).strip())
Using XPATH
and splitlines()
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]"))).get_attribute("innerHTML").splitlines()[-3])
Note : You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
If your usecase is to extract only the text H MATTHEWS you can use either of the following solutions:
Using XPATH
and childNodes :
print(re.split('[:]', driver.execute_script('return arguments[0].childNodes[9].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]")))).strip())[1])
Using XPATH
and splitlines()
:
print(re.split('[:]', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]"))).get_attribute("innerHTML").splitlines()[-3])[1])
You can find a detailed relevant discussion in:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.