简体   繁体   中英

Why is my RFC 2822 date not parsed by chrono?

I'm writing some code to parse RSS feeds but I have trouble with the Abstruse Goose RSS feed . If you look in that feed, dates are encoded as Mon, 06 Aug 2018 00:00:00 UTC . To me, it looks like RFC 2822.

I tried to parse it using chrono's DateTime::parse_from_rfc2822 , but I get ParseError(NotEnough) .

let pub_date = entry.pub_date().unwrap().to_owned();
return rfc822_sanitizer::parse_from_rfc2822_with_fallback(&pub_date)
    .unwrap_or_else(|e| {
        panic!(
            "pub_date for item {:?} (value is {:?}) can't be parsed due to error {:?}",
            &entry, pub_date, e
        )
    })
    .naive_utc();

Is there something I'm doing wrong? Do I have to hack it some way?

I use rfc822_sanitizer which does a good job at fixing bad writing errors (most of the time). I don't think it impacts the parsing ... but who knows?

The RFC2822 date/time format is very well codified in the RFC as the following format:

date-time       =       [ day-of-week "," ] date FWS time [CFWS]
day-of-week     =       ([FWS] day-name) / obs-day-of-week
day-name        =       "Mon" / "Tue" / "Wed" / "Thu" /
                        "Fri" / "Sat" / "Sun"
date            =       day month year
year            =       4*DIGIT / obs-year
month           =       (FWS month-name FWS) / obs-month
month-name      =       "Jan" / "Feb" / "Mar" / "Apr" /
                        "May" / "Jun" / "Jul" / "Aug" /
                        "Sep" / "Oct" / "Nov" / "Dec"
day             =       ([FWS] 1*2DIGIT) / obs-day
time            =       time-of-day FWS zone
time-of-day     =       hour ":" minute [ ":" second ]
hour            =       2DIGIT / obs-hour
minute          =       2DIGIT / obs-minute
second          =       2DIGIT / obs-second
zone            =       (( "+" / "-" ) 4DIGIT) / obs-zone

Where obs-zone is defined as follows:

obs-zone        =       "UT" / "GMT" /          ; Universal Time
                                                ; North American UT
                                                ; offsets
                        "EST" / "EDT" /         ; Eastern:  - 5/ - 4
                        "CST" / "CDT" /         ; Central:  - 6/ - 5
                        "MST" / "MDT" /         ; Mountain: - 7/ - 6
                        "PST" / "PDT" /         ; Pacific:  - 8/ - 7
                        %d65-73 /               ; Military zones - "A"
                        %d75-90 /               ; through "I" and "K"
                        %d97-105 /              ; through "Z", both
                        %d107-122               ; upper and lower case

Something a lot of people get wrong when rolling their own timestamp generation library is this particular point - how to properly label an RFC2822 TZ offset. The reason UT is as it is is because UTC and UT are not exactly the same (one has additional seconds, the other has... four variants! And the RFC does not define which one is used; they're all subtly different).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM