简体   繁体   中英

Webscraping using python - Keep getting repeat 1st row values from jquery table

Long time lurker here...never had to ask a question before; there is so much useful stuff on here. I'm a python newbie and I feel like the answer to this should be obvious, but I'm at the point where I've been staring at it for an hour.

Trying to scrape a list of "Name" column from this https://apps.neb-one.gc.ca/REGDOCS/Search/SearchAdvancedResults?p=4 here is my simple code:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('headless')

driver = webdriver.Chrome(chrome_options=options)

driver.get('https://apps.neb-one.gc.ca/REGDOCS/Search/SearchAdvancedResults? 
p=4')

driver.implicitly_wait(5)

rows = driver.find_elements_by_xpath('//*[@id="details- 
elements"]/table/tbody/tr')

output = []

for row in rows:
    title = row.find_element_by_xpath('//*[@id="details- 
   elements"]/table/tbody/tr/td[1]/details/summary/a').get_attribute('text')
    output.append(title)

driver.close()

print(output)

It partly works. But for some reason the code will only return a list of 20 items (correct length), that consists of the Name (correct column) of the first row repeated (ugh...so close). Like this:

['Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt -
Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5', 'Receipt - Accusé de réception - A6F0I5']

What simple thing am I overlooking? Thanks in advance!

Try below code to get required output:

output = [item.text for item in driver.find_elements_by_tag_name('summary')]

PS Note that if you want to get descendants of each row you need to specify the dot (context) in the beginning of XPath expression:

for row in rows:
    row.find_element_by_xpath('.//descendant_node') # '//descendant_node' will always return you the first found node in DOM

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM