簡體   English   中英

如何使用BeautifulSoup從定位標記中提取href鏈接?

[英]How to extract href links from anchor tags using BeautifulSoup?

我一直在嘗試僅提取與每個頁面上的作業相對應的鏈接。 但是由於某些原因,當我執行腳本時它們不打印。 沒有錯誤發生。 對於輸入,我分別輸入了工程,多倫多。 這是我的代碼。

import requests
from bs4 import BeautifulSoup
import webbrowser

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

all_job_url = []

for tag in prettify.find_all('div', {'data-tn-element':"jobTitle"}):
    for links in tag.find_all('a'):
        print (links['href'])

你應該尋找錨a標簽。 看起來像這樣:

<a class="turnstileLink" data-tn-element="jobTitle" href="/rc/clk?jk=3611ac98c0167102&amp;fccid=459dce363200e1be" ...>Project <b>Engineer</b></a>

調用soup.find_all並遍歷結果集,並通過href屬性提取鏈接。

import requests
from bs4 import BeautifulSoup

# valid query, replace with something else
url = "https://ca.indeed.com/jobs?q=engineer&l=Calgary%2C+AB" 

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

all_job_url = []    
for tag in soup.find_all('a', {'data-tn-element':"jobTitle"}):
    all_job_url.append(tag['href'])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM