使用BeautifulSoup從網頁檢索鏈接

Question

我試圖從某個位置的網頁上提取鏈接，然后打開該鏈接，然后重復該過程指定的次數。 問題是我不斷返回相同的URL，因此看來我的代碼只是拉動標簽，打印標簽，不打開它，並在關閉前進行X次該過程。

我已經多次編寫並重新編寫了這段代碼，但是對於我一生來說，我只是無法弄清楚。 請告訴我我做錯了

嘗試使用list放置錨標記，然后在列表中請求的位置打開url，然后清除列表，然后再次開始循環。

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

#url = input('Enter - ')
url = "http://py4e-data.dr-chuck.net/known_by_Fikret.html"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

count = 0 
url_loop = int(input("Enter how many times to loop through: ")) 
url_pos= int(input("Enter position of URL: "))
url_pos = url_pos - 1

print(url_pos)



# Retrieve all of the anchor tags
tags = soup('a')
while True:
    if url_loop == count:
        break
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    url = tags[url_pos].get('href', None)

    print("Acquiring URL: ", url)

    count = count + 1  

print("final URL:", url)

Answer 1

對於初始文檔，標簽可能只提取了一次：

# Retrieve all of the anchor tags
tags = soup('a')

如果要在提取每個文檔后重新提取標簽，它們將反映最后一個文檔。

使用BeautifulSoup從網頁檢索鏈接

問題描述

1 個解決方案

解決方案1
0 已采納 2019-09-16 23:49:35

使用BeautifulSoup從網頁檢索鏈接

問題描述

1 個解決方案

解決方案1 0 已采納 2019-09-16 23:49:35

解決方案1
0 已采納 2019-09-16 23:49:35