简体   繁体   中英

Python and BeautifulSoup URL parse

I have the following code which i am using to try and get the title and description from the redsox news. I have it working but for one minor detail. its showing the tags. How can i eliminate them?

import urllib2
from BeautifulSoup import BeautifulSoup
# or if you're using BeautifulSoup4:
# from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://partner.mlb.com/partnerxml/gen/news/rss/bos.xml').read())

title = soup.find('item').title
desc = soup.find('item').description

print "Title: %s " % (title)
print "Summary: %s " % (desc)

This is what it shows

Title: <title>Shaw or Panda? Hot corner duel heats up</title> 
Summary: <description>With two weeks until Opening Day, the hottest topic in Red Sox camp is the competition at the hot corner between incumbent Pablo Sandoval and the emerging Travis Shaw.</description> 
>>> 

Try:

print "Title: %s " % (title.text)
print "Summary: %s " % (desc.text)

You can do better with BeautifulSoup, but this is the quick way to make it work.

print ("Title: %s " % (title.get_text()))
print ("Summary: %s " % (desc.get_text()))

this works

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM