如何在python beautifulsoup中获取下一页链接？

Question

I have this link: 我有这个链接：

http://www.brothersoft.com/windows/categories.html

I am trying to to get the link for the item inside the div. 我想在div中获取项目的链接。 Example: 例：

http://www.brothersoft.com/windows/mp3_audio/midi_tools/

I have tried this code: 我试过这段代码：

import urllib
from bs4 import BeautifulSoup

url = 'http://www.brothersoft.com/windows/categories.html'

pageHtml = urllib.urlopen(url).read()

soup = BeautifulSoup(pageHtml)

sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]

for i in sAll:
    print "http://www.brothersoft.com"+i['href']

But I only get output: 但我只得到输出：

http://www.brothersoft.com/windows/mp3_audio/

How can I get output that I needed? 如何获得我需要的输出？

Answer 1

Url http://www.brothersoft.com/windows/mp3_audio/midi_tools/ is not in tag <div class='brLeft'> , so if output is http://www.brothersoft.com/windows/mp3_audio/ , that's correct. 网址http://www.brothersoft.com/windows/mp3_audio/midi_tools/不在标记<div class='brLeft'> ，因此如果输出为http://www.brothersoft.com/windows/mp3_audio/ ，那就是正确。

If you want to get the url you want, change 如果您想获得所需的网址，请进行更改

sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]

to 至

sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]

UPDATE: 更新：

an example to get info inside 'midi_tools' 在'midi_tools'中获取信息的示例

import urllib 
from bs4 import BeautifulSoup

url = 'http://www.brothersoft.com/windows/categories.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]
for i in sAll:
    suburl = "http://www.brothersoft.com"+i['href']    #which is a url like 'midi_tools'

    content = urllib.urlopen(suburl).read()
    anosoup = BeautifulSoup(content)
    ablock = anosoup.find('table',{'id':'courseTab'})
    for atr in ablock.findAll('tr',{'class':'border_bot '}):
        print atr.find('dt').a.string      #name
        print "http://www.brothersoft.com" + atr.find('a',{'class':'tabDownload'})['href']   #link

如何在python beautifulsoup中获取下一页链接？

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-08-22 10:42:16

如何在python beautifulsoup中获取下一页链接？

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-08-22 10:42:16

解决方案1
2 已采纳 2013-08-22 10:42:16