简体   繁体   English

满足某些条件时无法摆脱循环

[英]Can't get rid of a loop when certain conditions are met

I've created a script in python to get the first 400 links of search results from bing.我在 python 中创建了一个脚本,以从 bing 获取搜索结果的前 400 个链接。 It's not sure that there will always be at least 400 results.不确定是否总是至少有 400 个结果。 In this case the number of results is around 300. There are 10 results in it's landing page.在这种情况下,结果数约为 300。它的目标网页中有 10 个结果。 However, the rest of the results can be found traversing next pages.但是,遍历下一页可以找到结果的 rest。 The problem is when there is no more next page link in there, the webpage displays the last results over and over again.问题是当那里没有更多的下一页链接时,网页会一遍又一遍地显示最后的结果。

Search keyword is michael jackson and ths is a full-fledged link搜索关键字是michael jackson ,这是一个完整的链接

How can I get rid of the loop when there are no more new results or the results are less than 400?`当没有更多新结果或结果小于 400 时,如何摆脱循环?

I've tried with:我试过:

import time
import requests
from bs4 import BeautifulSoup

link = "https://www.bing.com/search?"

params = {'q': 'michael jackson','first': ''}

def get_bing_results(url):
    q = 1
    while q<=400:
        params['first'] = q
        res = requests.get(url,params=params,headers={
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
            })
        soup = BeautifulSoup(res.text,"lxml")
        for link in soup.select("#b_results h2 > a"):
            print(link.get("href"))

        time.sleep(2)
        q+=10

if __name__ == '__main__':
    get_bing_results(link)

As I mentioned in the comments, couldn't you do something like this:正如我在评论中提到的,你不能做这样的事情:

import time
import requests
from bs4 import BeautifulSoup

link = "https://www.bing.com/search?"

params = {'q': 'michael jackson','first': ''}

def get_bing_results(url):
    q = 1
    prev_soup = str()
    while q <= 400:
        params['first'] = q
        res = requests.get(url,params=params,headers={
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
            })
        soup = BeautifulSoup(res.text,"lxml")
        if str(soup) != prev_soup:
            for link in soup.select("#b_results h2 > a"):
                print(link.get("href"))
            prev_soup = str(soup)
        else:
            break
        time.sleep(2)
        q+=10

if __name__ == '__main__':
    get_bing_results(link)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM