如何迭代鏈接並在特定位置訪問一個？

Question

我正在做一個任務，我需要使用 BeautifulSoup 來解析它：http: //python-data.dr-chuck.net/known_by_Fikret.html

基本上，我需要打印初始 URL 並在位置 3 處找到 URL，訪問該 URL 並在該頁面上的位置 3 處找到鏈接，等等——這總共需要四次。

這是我到目前為止的代碼：

# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

#url = input('Enter - ')
url =  "http://py4e-data.dr-chuck.net/known_by_Fikret.html"
timesToRepeat = '4'
positionInput = '3'
#timesToRepeat = input('Repeat how many times?: ')
#positionInput = input('Enter Position: ')
try:
    timesToRepeat = int(timesToRepeat)
    positionInput = int(positionInput)
except:
    print("please add an number")
    quit()

# Retrieve all of the anchor tags
totalCount = 0
currentRepetitionCount = 0

html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')

#Leave this all alone ^^^^
print("Retrieving: ",url)
for i in range(timesToRepeat):
    html = urllib.request.urlopen(url, context=ctx).read()
    for tag in tags:
        currentRepetitionCount += 1

        if not totalCount >= timesToRepeat:
            if currentRepetitionCount == positionInput:
                #print("current",currentRepetitionCount)
                #print("total",totalCount)
                #print("Retrieving: ",url)
                currentRepetitionCount = 0
                totalCount +=1
                url = tag.get('href', None)

                print("Retrieving: ",url)

我得到這個：

Retrieving:  http://py4e-data.dr-chuck.net/known_by_Fikret.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Montgomery.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Anona.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Zoe.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Carmyle.html

但我應該得到的是：

Retrieving: http://py4e-data.dr-chuck.net/known_by_Fikret.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Montgomery.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Mhairade.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Butchi.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Anayah.html

似乎鏈接沒有改變，每次都只是在初始鏈接上找到第三個位置，我似乎無法終生修復它。

Answer 1

嘗試簡化您的代碼，將重點放在您的問題和主要問題上。 因此，例如， if not totalCount >= timesToRepeat:

例子

請注意，在請求循環中的第一個url以避免重復時，我在timesToRepeat中添加了+1 。

from bs4 import BeautifulSoup
import requests

url = 'http://py4e-data.dr-chuck.net/known_by_Fikret.html'
timesToRepeat = 4
positionInput = 3

for i in range(timesToRepeat+1):
    print(f'Retrieving: {url}')
    soup=BeautifulSoup(requests.get(url).text)
    tag = soup.select('a')[positionInput-1]
    url = tag.get('href')

輸出

Retrieving: http://py4e-data.dr-chuck.net/known_by_Fikret.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Montgomery.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Mhairade.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Butchi.html
Retrieving: http://py4e-data.dr-chuck.net/known_by_Anayah.html

如何迭代鏈接並在特定位置訪問一個？

問題描述

1 個解決方案

解決方案1
0 2022-06-12 09:34:18

例子

輸出

如何迭代鏈接並在特定位置訪問一個？

問題描述

1 個解決方案

解決方案1 0 2022-06-12 09:34:18

例子

輸出

解決方案1
0 2022-06-12 09:34:18