简体   繁体   中英

Selecting all visible text on a webpage using XPATH and Selenium in Python returns all text as one WebElement

I want to select all the visible text under a web page where the text of each element/node in the DOM is separated.

PATH = "C:\Program Files (x86)\chromedriver.exe"
chrome_options = Options()
chrome_options.add_argument("--start-maximized") # must! else results are affected

driver = webdriver.Chrome(PATH, chrome_options=chrome_options)

driver.get("https://www.tesco.com/groceries/en-GB/products/291496210")
elements = driver.find_elements_by_xpath("//html/body//*[@class!='visually-hidden']")
# above xpath expression finds all elements under body that do not have the class of 'visually-hidden'

print(elements)

The problem I am facing is that the first element returned in the elements list is the whole text of the whole web page, whereas I would like the text of each node that satisfies the XPATH expression to be a separate WebElement, for me to get properties related to it on its own.

Please help me out, thanks!

You should iterate over all the elements, get text from each one and print it, like this:

driver.get("https://www.tesco.com/groceries/en-GB/products/291496210")
elements = driver.find_elements_by_xpath("//html/body//*[@class!='visually-hidden']")
for element in elements:
    print(element.text)

Also, you should add some delay to make the page fully loaded before you getting all that elements and extracting their texts.
The simplest, but not the recommended way to do that is to add sleep, like this:

driver.get("https://www.tesco.com/groceries/en-GB/products/291496210")
time.sleep(10)
elements = driver.find_elements_by_xpath("//html/body//*[@class!='visually-hidden']")
for element in elements:
    print(element.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM