I want to find all tags with 'href' attribute of a webpage using selenium. But the methods available are tag specific. example: find_element_by_*() I want the code to be tag independent. ie get all tags that have a href attribute. Any help would be appreciated!
You can use XPath or CSS Selectors.
>>> driver.get('https://www.python.org')
>>>
>>> len(driver.find_elements_by_css_selector('[href]'))
241
>>> len(driver.find_elements_by_xpath('//*[@href]'))
241
https://selenium-python.readthedocs.io/locating-elements.html#locating-by-xpath
https://selenium-python.readthedocs.io/locating-elements.html#locating-elements-by-css-selectors
Try this:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://google.com')
elements = driver.find_elements_by_xpath('//*[@href]')
all_tags = [el.tag_name for el in elements]
print(all_tags)
driver.quit()
I will reference this post: https://stackoverflow.com/a/27307235/15303240
It is not possible using a selenium webdriver API, but you can execute a javascript code to get all attributes :
driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', element)
Demo:
>>> from selenium import webdriver
>>> from pprint import pprint
>>> driver = webdriver.Firefox()
>>> driver.get('https://stackoverflow.com')
>>>
>>> element = driver.find_element_by_xpath('//div[@class="network-items"]/a')
>>> attrs = driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', element)
>>> pprint(attrs)
{u'class': u'topbar-icon icon-site-switcher yes-hover js-site-switcher-button js-gps-track',
u'data-gps-track': u'site_switcher.show',
u'href': u'//stackexchange.com',
u'title': u'A list of all 132 Stack Exchange sites'}
For completeness sake, an alternative solution would be to get the tag's outerHTML
and parse the attributes using an HTML parser. Example (using BeautifulSoup
):
>>> from bs4 import BeautifulSoup
>>> html = element.get_attribute('outerHTML')
>>> attrs = BeautifulSoup(html, 'html.parser').a.attrs
>>> print(attrs)
{u'class': [u'topbar-icon',
u'icon-site-switcher',
u'yes-hover',
u'js-site-switcher-button',
u'js-gps-track'],
u'data-gps-track': u'site_switcher.show',
u'href': u'//stackexchange.com',
u'title': u'A list of all 132 Stack Exchange sites'}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.