as titled, am trying to build a small application that will aggregate RSS from different blogs. Am trying to test out and explore feedparser for this operation, am stuck though trying to write a peace of code that would detect the rss feed.
Most people would just enter www.mysite.com/blog which is not exactly the URL to the RSS feed. If there a way for me to detect the RSS feed, am trying to replicate the browser behavior where it can see the RSS URL.
any ideas?
Browsers use RSS feed auto-discovery and Atom feed auto-discovery to find feeds on a given web page.
For example, the django question lists are available via an Atom feed which is linked in the HTML header of the associated pages with:
<link rel="alternate" type="application/atom+xml" title="Feed of questions tagged python" href="/feeds/tag/python" />
You'll need to parse out the <link rel="alternate">
tags in a given page to discover these; anything with an application/atom+xml
or application/rss+xml
type fits.
Use something like BeautifulSoup to parse the HTML document and look for the RSS feeds. The following is a basic example and not necessarily the most efficient:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
rss_links = soup.select('link[type="application/rss+xml"]')
for link in rss_links:
rss_url = link.get('href')
See the full BeautifulSoup documentation .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.