使用python和BeautifulSoup从网页检索特定链接

Question

I have been trying to retrieve href link from a page and using as a variable for next href link. 我一直在尝试从页面检索href链接，并用作下一个href链接的变量。 But I stuck at one point where I have multiple href links with the different file extension(like zip,md5 etc) and only needed to a zip extension file. 但是我停留在一个地方，在那里我有多个带有不同文件扩展名（例如zip，md5等）的href链接，只需要一个zip扩展名文件。 here is the code I am trying to implement. 这是我尝试实现的代码。

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://example.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
       if '/abc' in link['href']:
          basename = link['href'].split("/")[11]
          print basename

        status, response = http.request('http://example.com/%basename',basename)
        for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
            if link.has_key('href'):
                if '/abc' in link['href']:
                    basename = link['href'].split("/")[11]
                    print basename

Answer 1

try it: 试试吧：

import os
# YOY CODE here

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/abc' in link['href']:
            basename = link['href'].split("/")[11]
            # check file extension
            filename, file_extension = os.path.splitext(basename)
            print basename, file_extension
            if file_extension.lower() == 'zip':
                continue
       # YOUR LAST CODE

使用python和BeautifulSoup从网页检索特定链接

问题描述

1 个解决方案

解决方案1
0 2017-09-11 06:52:17

使用python和BeautifulSoup从网页检索特定链接

问题描述

1 个解决方案

解决方案1 0 2017-09-11 06:52:17

解决方案1
0 2017-09-11 06:52:17