[英]How can I get this for loop to work correctly to replace a line in a file when certain conditions are met?
[英]Can't get rid of a loop when certain conditions are met
我在 python 中創建了一個腳本,以從 bing 獲取搜索結果的前 400 個鏈接。 不確定是否總是至少有 400 個結果。 在這種情況下,結果數約為 300。它的目標網頁中有 10 個結果。 但是,遍歷下一頁可以找到結果的 rest。 問題是當那里沒有更多的下一頁鏈接時,網頁會一遍又一遍地顯示最后的結果。
搜索關鍵字是michael jackson
,這是一個完整的鏈接
當沒有更多新結果或結果小於 400 時,如何擺脫循環?
我試過:
import time
import requests
from bs4 import BeautifulSoup
link = "https://www.bing.com/search?"
params = {'q': 'michael jackson','first': ''}
def get_bing_results(url):
q = 1
while q<=400:
params['first'] = q
res = requests.get(url,params=params,headers={
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
})
soup = BeautifulSoup(res.text,"lxml")
for link in soup.select("#b_results h2 > a"):
print(link.get("href"))
time.sleep(2)
q+=10
if __name__ == '__main__':
get_bing_results(link)
正如我在評論中提到的,你不能做這樣的事情:
import time
import requests
from bs4 import BeautifulSoup
link = "https://www.bing.com/search?"
params = {'q': 'michael jackson','first': ''}
def get_bing_results(url):
q = 1
prev_soup = str()
while q <= 400:
params['first'] = q
res = requests.get(url,params=params,headers={
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
})
soup = BeautifulSoup(res.text,"lxml")
if str(soup) != prev_soup:
for link in soup.select("#b_results h2 > a"):
print(link.get("href"))
prev_soup = str(soup)
else:
break
time.sleep(2)
q+=10
if __name__ == '__main__':
get_bing_results(link)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.