如何使用BeautifulSoup从定位标记中提取href链接？

Question

I've been trying to extract just the links corresponding to the jobs on each page. 我一直在尝试仅提取与每个页面上的作业相对应的链接。 But for some reason they dont print when I execute the script. 但是由于某些原因，当我执行脚本时它们不打印。 No errors occur. 没有错误发生。 for the inputs I put engineering, toronto respectively. 对于输入，我分别输入了工程，多伦多。 Here is my code. 这是我的代码。

import requests
from bs4 import BeautifulSoup
import webbrowser

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

all_job_url = []

for tag in prettify.find_all('div', {'data-tn-element':"jobTitle"}):
    for links in tag.find_all('a'):
        print (links['href'])

Answer 1

You should be looking for the anchor a tag. 你应该寻找锚a标签。 It looks like this: 看起来像这样：

<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=3611ac98c0167102&amp;fccid=459dce363200e1be" ...>Project <b>Engineer</b></a>

Call soup.find_all and iterate over the result set, extracting the links through the href attribute. 调用soup.find_all并遍历结果集，并通过href属性提取链接。

import requests
from bs4 import BeautifulSoup

# valid query, replace with something else
url = "https://ca.indeed.com/jobs?q=engineer&l=Calgary%2C+AB" 

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

all_job_url = []    
for tag in soup.find_all('a', {'data-tn-element':"jobTitle"}):
    all_job_url.append(tag['href'])

如何使用BeautifulSoup从定位标记中提取href链接？

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-07-05 01:44:25

如何使用BeautifulSoup从定位标记中提取href链接？

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-07-05 01:44:25

解决方案1
2 已采纳 2017-07-05 01:44:25