美汤——选择Class有意外结果

Question

I am new to programming and have been learning Python through web scraping.我是编程新手，一直在学习Python通过 web 抓取。 What I am trying to do is capture the below line from the site listed in my URL:我要做的是从我的 URL 中列出的站点中捕获以下行：

<a class="" href="https://www.adweek.com?paged=776%3Fs%3Dinterpublic&orderby=date&s=interpublic">776</a> , but I cannot seem to get there. <a class="" href="https://www.adweek.com?paged=776%3Fs%3Dinterpublic&orderby=date&s=interpublic">776</a> ，但我似乎无法到达那里。 It only returns the first line of pagination information and I can't figure out why.它只返回第一行分页信息，我不知道为什么。 Any help would be greatly appreciated任何帮助将不胜感激

import requests
from bs4 import BeautifulSoup
url = 'https://www.adweek.com/?s=interpublic&orderby=date'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
k =soup.find_all('div', {'class':'pagination-centered'})

Returns only --只退货——

[<div class="pagination-centered"><ul class="pagination">
 <li><span aria-current="page" class="current">1</span></li></ul></div>]

Thanks, Seth谢谢，赛斯

Answer 1

You can get pagination using a[href*="paged="] css selector:您可以使用a[href*="paged="] css 选择器进行分页：

import requests
from bs4 import BeautifulSoup

url = 'https://www.adweek.com/?s=interpublic&orderby=date'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

# print text and href
pagination = soup.select('a[href*="paged="]')
for p in pagination:
    print(p.text.strip(), p.get('href'))

"Next" has same url as first link, you can use set to get only unique href. “Next”与第一个链接具有相同的 url，您可以使用set来获取唯一的 href。 : ：

pagination = {p['href'] for p in soup.select('a[href*="paged="]')}

You can get last page number and iterate by changing parameter paged in the url until the last page.您可以通过更改 url 中的paged参数直到最后一页来获取最后一页编号并进行迭代。

Page source without JavaScript:没有 JavaScript 的页面源：

美汤——选择Class有意外结果

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-07-24 17:39:50

美汤——选择Class有意外结果

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-07-24 17:39:50

解决方案1
0 已采纳 2020-07-24 17:39:50