[英]How to get next page link in python beautifulsoup?
我有這個鏈接:
http://www.brothersoft.com/windows/categories.html
我想在div中獲取項目的鏈接。 例:
http://www.brothersoft.com/windows/mp3_audio/midi_tools/
我試過這段代碼:
import urllib
from bs4 import BeautifulSoup
url = 'http://www.brothersoft.com/windows/categories.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]
for i in sAll:
print "http://www.brothersoft.com"+i['href']
但我只得到輸出:
http://www.brothersoft.com/windows/mp3_audio/
如何獲得我需要的輸出?
網址http://www.brothersoft.com/windows/mp3_audio/midi_tools/
不在標記<div class='brLeft'>
,因此如果輸出為http://www.brothersoft.com/windows/mp3_audio/
,那就是正確。
如果您想獲得所需的網址,請進行更改
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]
至
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]
更新:
在'midi_tools'中獲取信息的示例
import urllib
from bs4 import BeautifulSoup
url = 'http://www.brothersoft.com/windows/categories.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]
for i in sAll:
suburl = "http://www.brothersoft.com"+i['href'] #which is a url like 'midi_tools'
content = urllib.urlopen(suburl).read()
anosoup = BeautifulSoup(content)
ablock = anosoup.find('table',{'id':'courseTab'})
for atr in ablock.findAll('tr',{'class':'border_bot '}):
print atr.find('dt').a.string #name
print "http://www.brothersoft.com" + atr.find('a',{'class':'tabDownload'})['href'] #link
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.