简体   繁体   中英

Beautiful Soup can't get news titles

from bs4 import BeautifulSoup
import requests
url ="http://www.basketnews.lt/lygos/59-nacionaline-krepsinio-asociacija/2013/naujienos.html"
r = requests.get(url)
soup = BeautifulSoup(r.text)

naujienos = soup.findAll('a', {'class':'title'})

print naujienos

Here is important part of HTML:

<div class="title">

    <a href="/news-73147-rockets-veikiausiai-pasiliks-mchalea.html"></a>
    <span class="feedbacks"></span>

</div>

I get empty list. Where is my mistake?

EDIT:

Thanks it worked. Now I want to print news titles. This is how I am trying to do it:

nba = soup.select('div.title > a')

for i in nba:
   print ""+i.string+"\n"

I get max 5 titles and error occurs: cannot concatenate 'str' and 'NoneType' objects

soup.findAll('a', {'class':'title'})

This says, give me all a tags that also have class="title" . That's obviously not what you're trying to do.

I think you want a tags that are the direct descendant of a tag with class="title" . You can try using a css selector:

soup.select('div.title > a')
Out[58]: 
[<a href="/news-73150-blatcheas-garantuoju-kad-laimesime.html">Blatche'as: âGarantuoju, kad laimÄsimeâ</a>,
 <a href="/news-73147-rockets-veikiausiai-pasiliks-mchalea.html">âRocketsâ veikiausiai pasiliks McHaleâÄ
</a>,
# snip lots of other links
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM