[英]retrieve specific links from web page using python and BeautifulSoup
我一直在尝试从页面检索href链接,并用作下一个href链接的变量。 但是我停留在一个地方,在那里我有多个带有不同文件扩展名(例如zip,md5等)的href链接,只需要一个zip扩展名文件。 这是我尝试实现的代码。
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://example.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/abc' in link['href']:
basename = link['href'].split("/")[11]
print basename
status, response = http.request('http://example.com/%basename',basename)
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/abc' in link['href']:
basename = link['href'].split("/")[11]
print basename
试试吧:
import os
# YOY CODE here
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_key('href'):
if '/abc' in link['href']:
basename = link['href'].split("/")[11]
# check file extension
filename, file_extension = os.path.splitext(basename)
print basename, file_extension
if file_extension.lower() == 'zip':
continue
# YOUR LAST CODE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.