简体   繁体   中英

Building blogs RSS feeds using Django (Python)

as titled, am trying to build a small application that will aggregate RSS from different blogs. Am trying to test out and explore feedparser for this operation, am stuck though trying to write a peace of code that would detect the rss feed.

Most people would just enter www.mysite.com/blog which is not exactly the URL to the RSS feed. If there a way for me to detect the RSS feed, am trying to replicate the browser behavior where it can see the RSS URL.

any ideas?

Browsers use RSS feed auto-discovery and Atom feed auto-discovery to find feeds on a given web page.

For example, the question lists are available via an Atom feed which is linked in the HTML header of the associated pages with:

<link rel="alternate" type="application/atom+xml" title="Feed of questions tagged python" href="/feeds/tag/python" />

You'll need to parse out the <link rel="alternate"> tags in a given page to discover these; anything with an application/atom+xml or application/rss+xml type fits.

Use something like BeautifulSoup to parse the HTML document and look for the RSS feeds. The following is a basic example and not necessarily the most efficient:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)

rss_links = soup.select('link[type="application/rss+xml"]')
for link in rss_links:
    rss_url = link.get('href')

See the full BeautifulSoup documentation .

There is a great app exactly for this, is called Feedjack

But you will find yourself banging your head to wall when the RSS feed will contain less than 100 chars.

For full control (aggregating exactly what you need) and for websites without any RSS feeds I would recommend Scrapy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM