簡體   English   中英

使用python和BeautifulSoup從網頁檢索特定鏈接

[英]retrieve specific links from web page using python and BeautifulSoup

我一直在嘗試從頁面檢索href鏈接,並用作下一個href鏈接的變量。 但是我停留在一個地方,在那里我有多個帶有不同文件擴展名(例如zip,md5等)的href鏈接,只需要一個zip擴展名文件。 這是我嘗試實現的代碼。

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://example.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
       if '/abc' in link['href']:
          basename = link['href'].split("/")[11]
          print basename

        status, response = http.request('http://example.com/%basename',basename)
        for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
            if link.has_key('href'):
                if '/abc' in link['href']:
                    basename = link['href'].split("/")[11]
                    print basename

試試吧:

import os
# YOY CODE here

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/abc' in link['href']:
            basename = link['href'].split("/")[11]
            # check file extension
            filename, file_extension = os.path.splitext(basename)
            print basename, file_extension
            if file_extension.lower() == 'zip':
                continue
       # YOUR LAST CODE

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM