简体   繁体   中英

Identifying date types in Spreadsheet::ParseExcel

We are migrating from a MS Excel OLE based module to Spreadsheet::ParseExcel (or similar). As we have hundreds of programs using our module we'd prefer that we provide a drop in replacement ie the data returned is identical.

The problem is dates - using Excel we get a Win32::OLE::Variant object of type VT_DATE . As a workaround we can construct this manually by checking $cell->type() eq 'Date' and returning the object.

The problem is that the type is not reliably set, so we can't always do this.

The Date type is set in two places. This is the logic used in FmtDefault.pm :

if (   ( ( $iFmtIdx >= 0x0E ) && ( $iFmtIdx <= 0x16 ) )
    || ( ( $iFmtIdx >= 0x2D ) && ( $iFmtIdx <= 0x2F ) ) )
{
    return "Date";
}

and if this check fails and we get Numeric , then it does a backup check in ParseExcel.pm :

if ( $FmtStr =~ m{^[dmy][-\\/dmy]*$}i ) {
    $rhKey{Type} = "Date";
}

However a number of common format strings are not working, for example:

[$-C09]dddd\\,\\ d\\ mmmm\\ yyyy;@ i.e. Sunday, 24 January 1982
d/m/yyyy;@ i.e. 24/1/1982

I've checked the Excel specification at openoffice.org and also read guides such as http://jonvonderheyden.net/excel/a-comprehensive-guide-to-number-formats-in-excel/#date_code and it seems that the below rule will match a date format string:

A string with d, m, or y characters, which are not between "" or [], not preceded with \\ unless it's a \\\\, and not followed by - or *.

This seems very complicated and error-prone. Is there a better way?

It seems Spreadsheet::ParseExcel::Utility::ExcelFmt() flags a date format under $format_mode so perhaps this logic can be modified to return the flag? But I'd prefer something ready to go without changing the Spreadsheet::ParseExcel modules if possible.

Do you know what columns are supposed to be dates?

In my usage, I do, and convert them with:

$val = $cell->unformatted();
# if it was properly set as a Date cell, the value will be a number of days since 1900 or 1904
# that we can convert to a date, regardless of the format they were shown.
if ( $val =~ /^[0-9]{5}(?:\.[0-9]+)?\z/ ) {
    $date = Spreadsheet::ParseExcel::Utility::ExcelFmt( 'YYYY-MM-DD', $val, $wb->{'Flg1904'} );
}
else {
    $val = $cell->value();
    $val =~ s/^'//;
    # try parsing it with Date::Manip, which handles all common formats (see its ParseDateString doc)
    use Date::Manip ();
    Date::Manip::Date_Init("TZ=GMT","DateFormat=US");
    $date = Date::Manip::UnixDate( $val, '%Y-%m-%d' );
}

Update: sounds like you are best off modifying ExcelFmt, something like this (untested):

--- Utility.pm.orig 2014-12-17 07:16:06.609942082 -0800
+++ Utility.pm  2014-12-17 07:18:14.453965764 -0800
@@ -69,7 +69,7 @@
 #
 sub ExcelFmt {

-    my ( $format_str, $number, $is_1904, $number_type, $want_subformats ) = @_;
+    my ( $format_str, $number, $is_1904, $number_type, $want_subformats, $want_format_mode ) = @_;

     # Return text strings without further formatting.
     return $number unless $number =~ $qrNUMBER;
@@ -956,8 +956,14 @@
     $result =~ s/^\$\-/\-\$/;
     $result =~ s/^\$ \-/\-\$ /;

-    # Return color and locale strings if required.
-    if ($want_subformats) {
+    # Return format mode and/or color and locale strings if required.
+    if ( $want_subformats && $want_format_mode ) {
+        return ( $result, $color, $locale, $format_mode );
+    }
+    elsif ($want_format_mode) {
+        return ( $result, $format_mode );
+    }
+    elsif ($want_subformats) {
         return ( $result, $color, $locale );
     }
     else {

Be sure to submit it to the maintainer for inclusion in a later release.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM