简体   繁体   English

如何使用BeautifulSoup从定位标记中提取href链接?

[英]How to extract href links from anchor tags using BeautifulSoup?

I've been trying to extract just the links corresponding to the jobs on each page. 我一直在尝试仅提取与每个页面上的作业相对应的链接。 But for some reason they dont print when I execute the script. 但是由于某些原因,当我执行脚本时它们不打印。 No errors occur. 没有错误发生。 for the inputs I put engineering, toronto respectively. 对于输入,我分别输入了工程,多伦多。 Here is my code. 这是我的代码。

import requests
from bs4 import BeautifulSoup
import webbrowser

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

all_job_url = []

for tag in prettify.find_all('div', {'data-tn-element':"jobTitle"}):
    for links in tag.find_all('a'):
        print (links['href'])

You should be looking for the anchor a tag. 你应该寻找锚a标签。 It looks like this: 看起来像这样:

<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=3611ac98c0167102&amp;fccid=459dce363200e1be" ...>Project <b>Engineer</b></a>

Call soup.find_all and iterate over the result set, extracting the links through the href attribute. 调用soup.find_all并遍历结果集,并通过href属性提取链接。

import requests
from bs4 import BeautifulSoup

# valid query, replace with something else
url = "https://ca.indeed.com/jobs?q=engineer&l=Calgary%2C+AB" 

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

all_job_url = []    
for tag in soup.find_all('a', {'data-tn-element':"jobTitle"}):
    all_job_url.append(tag['href'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM