[英]retrieve specific links from web page using python and BeautifulSoup
我一直在嘗試從頁面檢索href鏈接,並用作下一個href鏈接的變量。 但是我停留在一個地方,在那里我有多個帶有不同文件擴展名(例如zip,md5等)的href鏈接,只需要一個zip擴展名文件。 這是我嘗試實現的代碼。
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://example.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/abc' in link['href']:
basename = link['href'].split("/")[11]
print basename
status, response = http.request('http://example.com/%basename',basename)
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/abc' in link['href']:
basename = link['href'].split("/")[11]
print basename
試試吧:
import os
# YOY CODE here
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/abc' in link['href']:
basename = link['href'].split("/")[11]
# check file extension
filename, file_extension = os.path.splitext(basename)
print basename, file_extension
if file_extension.lower() == 'zip':
continue
# YOUR LAST CODE
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.