How to extract href links from anchor tags using BeautifulSoup?

Question

I've been trying to extract just the links corresponding to the jobs on each page. But for some reason they dont print when I execute the script. No errors occur. for the inputs I put engineering, toronto respectively. Here is my code.

import requests
from bs4 import BeautifulSoup
import webbrowser

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

all_job_url = []

for tag in prettify.find_all('div', {'data-tn-element':"jobTitle"}):
    for links in tag.find_all('a'):
        print (links['href'])

Answer 1

You should be looking for the anchor a tag. It looks like this:

<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=3611ac98c0167102&amp;fccid=459dce363200e1be" ...>Project <b>Engineer</b></a>

Call soup.find_all and iterate over the result set, extracting the links through the href attribute.

import requests
from bs4 import BeautifulSoup

# valid query, replace with something else
url = "https://ca.indeed.com/jobs?q=engineer&l=Calgary%2C+AB" 

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

all_job_url = []    
for tag in soup.find_all('a', {'data-tn-element':"jobTitle"}):
    all_job_url.append(tag['href'])

How to extract href links from anchor tags using BeautifulSoup?

Question

1 answers

solution1
2 ACCPTED 2017-07-05 01:44:25

How to extract href links from anchor tags using BeautifulSoup?

Question

1 answers

solution1 2 ACCPTED 2017-07-05 01:44:25

solution1
2 ACCPTED 2017-07-05 01:44:25