简体   繁体   English

使用 open xml sdk 从 xlsx 读取日期

[英]Reading a date from xlsx using open xml sdk

I have a date in format "4/5/2011" (month/day/year) in a xlsx file in one of the cells.我在其中一个单元格的 xlsx 文件中有一个格式为“4/5/2011”(月/日/年)的日期。 Im trying to parse the file and load those data in some classes.我试图解析文件并将这些数据加载到某些类中。

So far the part where I parse the cell looks like this:到目前为止,我解析单元格的部分如下所示:

string cellValue = cell.InnerText;
if (cell.DataType != null)
{
    switch (cell.DataType.Value)
    {
        case CellValues.SharedString:
            // get string from shared string table
            cellValue = this.GetStringFromSharedStringTable(int.Parse(cellValue));
            break;
    }
}

I hoped that date would be a cell.DataType.我希望那个日期是一个 cell.DataType。 The truth is when parsing the cell with the date "4/5/2011", the value of cell.DataType is null and the value of the cell is "40638" and it is not an index to the shared string table.事实是,在解析日期为“4/5/2011”的单元格时,cell.DataType 的值为 null,单元格的值为“40638”,它不是共享字符串表的索引。 (I have tried that before and it ended up with an exception.) (我之前尝试过,但结果是异常。)

Any ideas?有任何想法吗? Thanks谢谢

Open XML stores dates as the number of days from 1 Jan 1900. Well, skipping the incorrect 29 Feb 1900 as a valid day. Open XML 将日期存储为从 1900 年 1 月 1 日起的天数。好吧,跳过不正确的 1900 年 2 月 29 日作为有效日期。 You should be able to find out algorithms to help you calculate the correct value.您应该能够找出算法来帮助您计算正确的值。 I believe some developers use DateTime.FromOADate() as a helper.我相信一些开发人员使用DateTime.FromOADate()作为帮手。

Also, the Cell class has the DataType property as Number by default.此外,默认情况下, Cell类的DataType属性为 Number。 So if it's null, it's a number, which includes dates in our case.所以如果它是空的,它是一个数字,在我们的例子中包括日期。

You only go to the shared strings table when the date stored is before the epoch (1 Jan 1900 in this case).只有当存储的日期在纪元之前(在这种情况下为 1900 年 1 月 1 日)时,您才转到共享字符串表。 And then in that case, the CellValue of the Cell class holds the index to the shared string table.然后在这种情况下,Cell 类的 CellValue 保存共享字符串表的索引。

It appears that the cell.DataType is not set for dates.似乎没有为日期设置 cell.DataType。

The way to do it is to see if the cell has a StyleIndex, which is an index into an array of cell formats in the document.这样做的方法是查看单元格是否有 StyleIndex,它是文档中单元格格式数组的索引。

You then use the cellFormat.NumberFormatId to see if this is a date data type.然后使用 cellFormat.NumberFormatId 来查看这是否是日期数据类型。

Here is some code:这是一些代码:

    public class ExcelCellWithType
    {
        public string Value { get; set; }
        public UInt32Value ExcelCellFormat { get; set; }
        public bool IsDateTimeType { get; set; }
    }  

    public class ExcelDocumentData
    {
        public ExcelXmlStatus Status { get; set; }
        public IList<Sheet> Sheets { get; set; }
        public IList<ExcelSheetData> SheetData { get; set; }

        public ExcelDocumentData()
        {
            Status = new ExcelXmlStatus();
            Sheets = new List<Sheet>();
            SheetData = new List<ExcelSheetData>();
        }
    } 

    ...

    public ExcelDocumentData ReadSpreadSheetDocument(SpreadsheetDocument mySpreadsheet, ExcelDocumentData data)
    {
        var workbookPart = mySpreadsheet.WorkbookPart;

        data.Sheets = workbookPart.Workbook.Descendants<Sheet>().ToList();

        foreach (var sheet in data.Sheets)
        {
            var sheetData = new ExcelSheetData { SheetName = sheet.Name };
            var workSheet = ((WorksheetPart)workbookPart.GetPartById(sheet.Id)).Worksheet;

            sheetData.ColumnConfigurations = workSheet.Descendants<Columns>().FirstOrDefault();
            var rows = workSheet.Elements<SheetData>().First().Elements<Row>().ToList();
            if (rows.Count > 1)
            {
                foreach (var row in rows)
                {
                    var dataRow = new List<ExcelCellWithType>();

                    var cellEnumerator = GetExcelCellEnumerator(row);
                    while (cellEnumerator.MoveNext())
                    {
                        var cell = cellEnumerator.Current;
                        var cellWithType = ReadExcelCell(cell, workbookPart);
                        dataRow.Add(cellWithType);
                    }

                    sheetData.DataRows.Add(dataRow);
                }
            }
            data.SheetData.Add(sheetData);
        }

        return data;
    }

    ...

    private ExcelCellWithType ReadExcelCell(Cell cell, WorkbookPart workbookPart)
    {
        var cellValue = cell.CellValue;
        var text = (cellValue == null) ? cell.InnerText : cellValue.Text;
        if (cell.DataType?.Value == CellValues.SharedString)
        {
            text = workbookPart.SharedStringTablePart.SharedStringTable
                .Elements<SharedStringItem>().ElementAt(
                    Convert.ToInt32(cell.CellValue.Text)).InnerText;
        }

        var cellText = (text ?? string.Empty).Trim();

        var cellWithType = new ExcelCellWithType();

        if (cell.StyleIndex != null)
        {
            var cellFormat = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats.ChildElements[
                int.Parse(cell.StyleIndex.InnerText)] as CellFormat;

            if (cellFormat != null)
            {
                cellWithType.ExcelCellFormat = cellFormat.NumberFormatId;

                var dateFormat = GetDateTimeFormat(cellFormat.NumberFormatId);
                if (!string.IsNullOrEmpty(dateFormat))
                {
                    cellWithType.IsDateTimeType = true;

                    if (!string.IsNullOrEmpty(cellText))
                    {
                       if (double.TryParse(cellText, out var cellDouble))
                        {
                            var theDate = DateTime.FromOADate(cellDouble);
                            cellText = theDate.ToString(dateFormat);
                        }
                    }
                }
            }
        }

        cellWithType.Value = cellText;

        return cellWithType;
    }

    //// https://msdn.microsoft.com/en-GB/library/documentformat.openxml.spreadsheet.numberingformat(v=office.14).aspx
    private readonly Dictionary<uint, string> DateFormatDictionary = new Dictionary<uint, string>()
    {
        [14] = "dd/MM/yyyy",
        [15] = "d-MMM-yy",
        [16] = "d-MMM",
        [17] = "MMM-yy",
        [18] = "h:mm AM/PM",
        [19] = "h:mm:ss AM/PM",
        [20] = "h:mm",
        [21] = "h:mm:ss",
        [22] = "M/d/yy h:mm",
        [30] = "M/d/yy",
        [34] = "yyyy-MM-dd",
        [45] = "mm:ss",
        [46] = "[h]:mm:ss",
        [47] = "mmss.0",
        [51] = "MM-dd",
        [52] = "yyyy-MM-dd",
        [53] = "yyyy-MM-dd",
        [55] = "yyyy-MM-dd",
        [56] = "yyyy-MM-dd",
        [58] = "MM-dd",
        [165] = "M/d/yy",
        [166] = "dd MMMM yyyy",
        [167] = "dd/MM/yyyy",
        [168] = "dd/MM/yy",
        [169] = "d.M.yy",
        [170] = "yyyy-MM-dd",
        [171] = "dd MMMM yyyy",
        [172] = "d MMMM yyyy",
        [173] = "M/d",
        [174] = "M/d/yy",
        [175] = "MM/dd/yy",
        [176] = "d-MMM",
        [177] = "d-MMM-yy",
        [178] = "dd-MMM-yy",
        [179] = "MMM-yy",
        [180] = "MMMM-yy",
        [181] = "MMMM d, yyyy",
        [182] = "M/d/yy hh:mm t",
        [183] = "M/d/y HH:mm",
        [184] = "MMM",
        [185] = "MMM-dd",
        [186] = "M/d/yyyy",
        [187] = "d-MMM-yyyy"
    };

    private string GetDateTimeFormat(UInt32Value numberFormatId)
    {
        return DateFormatDictionary.ContainsKey(numberFormatId) ? DateFormatDictionary[numberFormatId] : string.Empty;
    }

你可以使用 DateTime.FromOADate(41690)

I had same issue - switched to EPPlus http://epplus.codeplex.com/我有同样的问题 - 切换到 EPPlus http://epplus.codeplex.com/

Note that it has LGPL license.请注意,它具有 LGPL 许可证。 So if you need your code base to be safe from GPL issue, simply use the library as is and your original code base license is safe.因此,如果您需要您的代码库免受 GPL 问题的影响,只需按原样使用该库,您的原始代码库许可证是安全的。

Adding my 2 pence worth.加上我的 2 便士价值。 I am processing a template, so I know a given cell is meant to be a DateTime.我正在处理一个模板,所以我知道给定的单元格是一个 DateTime。 So I end up in this method with a string parameter excelDateTime containing the cell value, which typically will be a OADate number like "42540.041666666664".所以我最终在这个方法中使用了一个包含单元格值的字符串参数 excelDateTime,它通常是一个 OADate 数字,如“42540.041666666664”。

public static bool TryParseExcelDateTime(string excelDateTimeAsString, out DateTime dateTime)
{
    double oaDateAsDouble;
    if (!double.TryParse(excelDateTimeAsString, out oaDateAsDouble)) //this line is Culture dependent!
        return false;
    //[...]
    dateTime = DateTime.FromOADate(oaDateAsDouble);

My problem is that the end user is in Germany, and because this is a website, we've set the Thread.CurrentThread.CurrentCulture and Thread.CurrentThread.CurrentUICulture to "DE-de".我的问题是最终用户在德国,因为这是一个网站,我们将 Thread.CurrentThread.CurrentCulture 和 Thread.CurrentThread.CurrentUICulture 设置为“DE-de”。 And when you call double.TryParse , it uses the culture to parse the number.当您调用double.TryParse ,它会使用区域性来解析数字。 So this line: double.TryParse("42540.041666666664", out oaDate) does indeed work, but it returns 42540041666666664 as in Germany the dot is a group separator.所以这一行: double.TryParse("42540.041666666664", out oaDate)确实有效,但它返回42540041666666664因为在德国点是一个组分隔符。 DateTime.FromOADate then fails because the number is out of range ( minOaDate = -657435.0, maxOaDate = +2958465.99999999 ). DateTime.FromOADate然后失败,因为数字超出范围( minOaDate = -657435.0, maxOaDate = +2958465.99999999 )。

This make me think that:这让我觉得:

  1. regardless of the locale on a user's machine, the OpenXML document contains numbers formatted in a default locale (US? invariant? in any case, with the dot as a decimal separator).无论用户机器上的语言环境如何,OpenXML 文档都包含以默认语言环境(美国?不变?在任何情况下,用点作为小数点分隔符)格式化的数字。 I've searched, but not found the spec for this.我已经搜索过,但没有找到这个规范。
  2. when doing double.TryParse on a potential OADate string, we should do it with double.TryParse(excelDateTimeAsString, NumberStyles.Any, CultureInfo.InvariantCulture, out oaDateAsDouble)) .当对潜在的 OADate 字符串执行double.TryParse时,我们应该使用double.TryParse(excelDateTimeAsString, NumberStyles.Any, CultureInfo.InvariantCulture, out oaDateAsDouble)) I'm using CultureInfo.InvariantCulture, but it should be whatever point 1 is, which I don't know for sure.我正在使用 CultureInfo.InvariantCulture,但它应该是第 1 点,我不确定。

We need to adopt different strategies to parse different types of columns using OpenXML.我们需要采用不同的策略来使用 OpenXML 解析不同类型的列。

To parse string & boolean values - we can use DataType property of cell, like below -要解析字符串和布尔值 - 我们可以使用单元格的 DataType 属性,如下所示 -

        switch (cell.DataType.Value)
        {
            case CellValues.SharedString:
                // Fetch value from SharedStrings array
                break;
            case CellValues.Boolean:
                text = cell.InnerText;
                switch (text)
                {
                    case "0": text = "false"; break;
                    default: text = "true"; break;
                }
                break;
        }

To parse date/time/datetime values (having either any built-in or any custom format applied) - DataType property is returned as null so this can be put like below -解析日期/时间/日期时间值(应用任何内置或任何自定义格式) - DataType属性返回为 null,因此可以如下所示 -

    if (cell.DataType == null)
        DateTime.FromOADate(double.Parse(cell.InnerText))

The above value returned will be in default format based on the locale settings on your machine.返回的上述值将采用默认格式,具体取决于您机器上的区域设置。 However, if you need to get the value in actual format as present in your excel & you are not sure of the format then you can access StyleIndex property associated with such cells.但是,如果您需要获取 excel 中实际格式的值并且您不确定格式,那么您可以访问与此类单元格关联的StyleIndex属性。

This StyleIndex property will give you an index of the style applied on the cell, which can be found in styles.xml file (below tag) -这个StyleIndex属性将为您提供应用于单元格的样式的索引,可以在styles.xml文件(标签下方)中找到 -

    <cellXfs count="3">
        <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>          
        <xf numFmtId="168" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="169" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
    </cellXfs>

In the above case, the StyleIndex value can be either 0, 1 or 2 - as there are 3 styles applied.在上述情况下, StyleIndex值可以是 0、1 或 2 - 因为应用了 3 种样式。 Styles with numFmtId in (0, 163) corresponds to built-in formats provided by Excel & numFmtId >= 164 corresponds to custom formats. numFmtId in (0, 163)样式对应于 Excel 提供的内置格式,而numFmtId >= 164对应于自定义格式。

From the StyleIndex value obtained above, you will get the numFmtId - which is mapped to a particular <numFmt> tag present under <numFmts> section (in styles.xml file) to get the actual date format applied on the cell.从上面获得的StyleIndex值中,您将获得numFmtId - 它映射到存在于<numFmts>部分(在styles.xml文件中)下的特定<numFmt>标记,以获取应用于单元格的实际日期格式。

    <numFmts count="2">
       <numFmt numFmtId="168" formatCode="[$£-809]#,##0.00"/>
       <numFmt numFmtId="169" formatCode="dd\-mmm\-yyyy\ hh:mm:ss"/>
    </numFmts>

The date format applied on the cell can be fetched using OpenXML API as well -也可以使用 OpenXML API 获取应用于单元格的日期格式 -

      CellFormat cellFmt = cellFormats.ChildElements[int.Parse(cell.StyleIndex.InnerText)] as CellFormat;
      string format = numberingFormats.Elements<NumberingFormat>()
                .Where(i => i.NumberFormatId.Value == cellFmt .NumberFormatId.Value)
                .First().FormatCode;

Each cell has 2 properties r (CellReference) and s(StyleIndex)每个单元格有 2 个属性 r (CellReference) 和 s(StyleIndex)

StyleIndex for numbers is 2 and for date is 3数字的 StyleIndex 为 2,日期的 StyleIndex 为 3

Date it is in ODate and you can convert to string format日期在 ODate 中,您可以转换为字符串格式

value = DateTime.FromOADate(double.Parse(value)).ToShortDateString(); value = DateTime.FromOADate(double.Parse(value)).ToShortDateString();

I do this after I retrieve any inline string:我在检索任何内联字符串后执行此操作:

    private static object Convert(this DocumentFormat.OpenXml.Spreadsheet.CellValues value, string content)
    {
        switch (value)
        {
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Boolean:
                if (content.Length < 2)
                {
                    return content?.ToUpperInvariant() == "T" || content == "1";
                }
                return System.Convert.ToBoolean(content);
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Date:
                if (double.TryParse(content, out double result))
                {
                    return System.DateTime.FromOADate(result);
                }
                return null;
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Number:
                return System.Convert.ToDecimal(content);
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Error:
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.String:
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.InlineString:
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.SharedString:
            default:
                return content;
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM