如何获得下一个分页“ href”？

Question

因此，我无法获取该URL的下一页的href链接。 我已经准备好获取所有文本以及该标签不包含的内容，但是我似乎无法全神贯注地删除不需要的文本，而只是获取href并浏览页面。

这是我的代码：

import requests
from bs4 import BeautifulSoup
import webbrowser
import time

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []

def all_next_pages():
    pages = prettify.find_all('div', {'class':'pagination'})
    for next_page in pages:
        next_page.find_all('a')
        nextpages.append(next_page)
        print(next_page)

all_next_pages()

Answer 1

这是一种获取搜索结果项链接的方法。 查找row result类，然后找到a标签，其中包含您需要的所有信息。

import requests
from bs4 import BeautifulSoup
import webbrowser
import time

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'

r = requests.get(url)
rcontent = r.text
prettify = BeautifulSoup(rcontent, "lxml")

filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []

def all_next_pages():
    pages = prettify.find_all('div', {'class':'  row  result'})
    for next_page in pages:
        info = next_page.find('a')
        url = info.get('href')
        title = info.get('title')
        print(title,url)

all_next_pages()

如何获得下一个分页“ href”？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-07-09 06:32:07

如何获得下一个分页“ href”？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-07-09 06:32:07

解决方案1
1 已采纳 2017-07-09 06:32:07