[英]How do I get the next pagination 'href'?
因此,我无法获取该URL的下一页的href链接。 我已经准备好获取所有文本以及该标签不包含的内容,但是我似乎无法全神贯注地删除不需要的文本,而只是获取href并浏览页面。
这是我的代码:
import requests
from bs4 import BeautifulSoup
import webbrowser
import time
jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'
r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")
filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []
def all_next_pages():
pages = prettify.find_all('div', {'class':'pagination'})
for next_page in pages:
next_page.find_all('a')
nextpages.append(next_page)
print(next_page)
all_next_pages()
这是一种获取搜索结果项链接的方法。 查找row result
类,然后找到a
标签,其中包含您需要的所有信息。
import requests
from bs4 import BeautifulSoup
import webbrowser
import time
jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'
r = requests.get(url)
rcontent = r.text
prettify = BeautifulSoup(rcontent, "lxml")
filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []
def all_next_pages():
pages = prettify.find_all('div', {'class':' row result'})
for next_page in pages:
info = next_page.find('a')
url = info.get('href')
title = info.get('title')
print(title,url)
all_next_pages()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.