简体   繁体   中英

How can I sort XML entries with LibXML and Perl?

I'm parsing an XML file with LibXML and need to sort the entries by date. Each entry has two date fields, one for when the entry was published and one for when it was updated.

<?xml version="1.0" encoding="utf-8"?>
...
<entry>
  <published>2009-04-10T18:51:04.696+02:00</published>
  <updated>2009-05-30T14:48:27.853+03:00</updated>
  <title>The title</title>
  <content>The content goes here</content>
</entry>
...

The XML file is already ordered by date updated, with the most recent first. I can easily reverse that to put the older entries first:

my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $xc = XML::LibXML::XPathContext->new($doc->documentElement());

foreach my $entry (reverse($xc->findnodes('//entry'))) {
  ...
}

However, I need to reverse sort the file by date published, not by date updated. How can I do that? The timestamp looks a little wonky too. Would I need to normalize that first?

Thanks!

Update: After fiddling around with XPath namespaces and failing, I made a function that parsed the XML and stored the values I needed in a hash. I then used a bare sort to sort the hash, which works just fine now.

One way would be changing your reverse to a sort statement (untested):

sub parse_date {
    # Transforms date from 2009-04-10T18:51:04.696+02:00 to 20090410
    my $date= shift;
    $date= join "", $date =~ m!\A(\d{4})-(\d{2})-(\d{2}).*!;
    return $date;
}

sub by_published_date {
    my $a_published= parse_date( $a->getChildrenByTagName('published') );
    my $b_published= parse_date( $b->getChildrenByTagName('published') );

    # putting $b_published in front will ensure the descending order.
    return $b_published <=> $a_published;
}

foreach my $entry ( sort by_published_date $xc->findnodes('//entry') ) {
    ...
}

Hope this helps a bit!

A bare sort may put times from different timezones out of order:

 print for sort "2009-06-15T08:00:00+07:00", "2009-06-15T04:00:00+00:00";

Here, the second time is 3 hours after the first, but sorts first.

I'm not sure what you mean by "wonky". Your example just shows timestamps in rfc3339 format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM