繁体   English   中英

如何获得下一个分页“ href”?

[英]How do I get the next pagination 'href'?

因此,我无法获取该URL的下一页的href链接。 我已经准备好获取所有文本以及该标签不包含的内容,但是我似乎无法全神贯注地删除不需要的文本,而只是获取href并浏览页面。

这是我的代码:

import requests
from bs4 import BeautifulSoup
import webbrowser
import time

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []

def all_next_pages():
    pages = prettify.find_all('div', {'class':'pagination'})
    for next_page in pages:
        next_page.find_all('a')
        nextpages.append(next_page)
        print(next_page)

all_next_pages()

这是一种获取搜索结果项链接的方法。 查找row result类,然后找到a标签,其中包含您需要的所有信息。

import requests
from bs4 import BeautifulSoup
import webbrowser
import time

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'

r = requests.get(url)
rcontent = r.text
prettify = BeautifulSoup(rcontent, "lxml")

filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []

def all_next_pages():
    pages = prettify.find_all('div', {'class':'  row  result'})
    for next_page in pages:
        info = next_page.find('a')
        url = info.get('href')
        title = info.get('title')
        print(title,url)

all_next_pages()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM