简体   繁体   中英

How can I get all the href links from a single page when web scraping using Python and Selenium?

I am working on my first programming project.

I am currently using an XPATH method to obtain the links from a webpage, however, when the program runs it returns "[None]". Not sure why this is happening and how to solve this problem.

The href links are implemented in the html code like this:

<div class="fixed-recipe-card__info">
        <h3 class="fixed-recipe-card__h3">
            <a href=“xyz” data-content-provider-id="" data-internal-referrer-link="rotd" class="fixed-recipe-card__title-link ng-isolate-scope" target="_self">
                <span class="fixed-recipe-card__title-link">Title</span>≠≠
            </a>
        </h3>

This is the code I've tried so far:

        chrome_path = '/Users/name/Downloads/chromedriver'  
        driver = webdriver.Chrome(executable_path=chrome_path)
        driver.get('https://www.website.com/')

        driver.implicitly_wait(10)
 
        # scrape for links on the page

        elems = driver.find_elements_by_xpath("//h3[@class='fixed-recipe-card__h3']")

        #store them in a list
        links = []

        for elem in elems:
        #fetch and store the links
           links.append(elem.get_attribute('href'))

        #remove the duplicates in list links [] 
        res = [i for n, i in enumerate(links) if i not in links[:n]]
        print (str(res)) 

elems = driver.find_elements_by_xpath("//h3[@class="fixed-receipe-card__h3"]/a')

You're trying to get the href attribute for an h3 tag instead of the a tag.

You can use css selector also:

elems = driver.find_elements_by_css_selector(".fixed-recipe-card__h3 [href]")
links = [elem.get_attribute('href') for elem in elems]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM