简体   繁体   中英

How do I retrieve all RSS entries that are no more than X days old

I am using Python and the RSS feedparser module to retrieve RSS entries. However I only want to retrieve a news item if it is no more than x days old.

For example if x=4 then my Python code should not fetch anything four days older than the current date.

Feedparser allows you to scrape the 'published' date for the entry, however it is of type unicode and I don't know how to convert this into a datetime object.

Here is some example input:

date = 'Thu, 29 May 2014 20:39:20 +0000'

Here is what I have tried:

from datetime import datetime
date_object = datetime.strptime(date, '%a, %d %b %Y %H:%M:%S %z')

This is the error I get:

ValueError: 'z' is a bad directive in format '%a, %d %b %Y %H:%M:%S %z'

This is what I hope to do with it:

from datetime import datetime
a = datetime(today)
b = datetime(RSS_feed_entry_date)
>>> a-b
datetime.timedelta(6, 1)
(a-b).days
6

For this, you already have a time.struct_time look at feed.entries[0].published_parsed

you can use time.mktime to convert this to a timestamp and compare it with time.time() to see how far in the past it is:

An example:

>>> import feedparser
>>> import time

>>> f = feedparser.parse("http://feeds.bbci.co.uk/news/rss.xml")
>>> f.entries[0].published_parsed
time.struct_time(tm_year=2014, tm_mon=5, tm_mday=30, tm_hour=14, tm_min=6, tm_sec=8, tm_wday=4, tm_yday=150, tm_isdst=0)

>>> time.time() - time.mktime(feed.entries[0].published_parsed)
4985.511506080627

obviosuly this will be a different value for you, but if this is less than (in your case) 86400 * 4 (number of seconds in 4 days), it's what you want.

So, concisely

[entry for entry in f.entries if time.time() - time.mktime(entry.published_parsed) < (86400*4)]

would give you your list

from datetime import datetime
date = 'Thu, 29 May 2014 20:39:20 +0000'
if '+' in date:
    dateSplit = date.split('+')
    offset = '+' + dateSplit[1]
    restOfDate = str(dateSplit[0])
date_object = datetime.strptime(restOfDate + ' ' + offset, '%a, %d %b %Y %H:%M:%S ' + offset)
print date_object

Yields 2014-05-29 20:39:20 , as I was researching your timezone error I came across this other SO question that says that strptime has trouble with time zones ( link to question) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM