Long time lurker here...never had to ask a question before; there is so much useful stuff on here. I'm a python newbie and I feel like the answer to this should be obvious, but I'm at the point where I've been staring at it for an hour.
Trying to scrape a list of "Name" column from this https://apps.neb-one.gc.ca/REGDOCS/Search/SearchAdvancedResults?p=4 here is my simple code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://apps.neb-one.gc.ca/REGDOCS/Search/SearchAdvancedResults?
p=4')
driver.implicitly_wait(5)
rows = driver.find_elements_by_xpath('//*[@id="details-
elements"]/table/tbody/tr')
output = []
for row in rows:
title = row.find_element_by_xpath('//*[@id="details-
elements"]/table/tbody/tr/td[1]/details/summary/a').get_attribute('text')
output.append(title)
driver.close()
print(output)
It partly works. But for some reason the code will only return a list of 20 items (correct length), that consists of the Name (correct column) of the first row repeated (ugh...so close). Like this:
['Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt -
Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5']
What simple thing am I overlooking? Thanks in advance!
Try below code to get required output:
output = [item.text for item in driver.find_elements_by_tag_name('summary')]
PS Note that if you want to get descendants of each row
you need to specify the dot (context) in the beginning of XPath expression:
for row in rows:
row.find_element_by_xpath('.//descendant_node') # '//descendant_node' will always return you the first found node in DOM
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.