简体   繁体   English

读取非常大的Excel文件

[英]Reading very large excel file

I am using this article to read a very large excel file, using SAX approach. 我正在使用本文通过SAX方法读取非常大的ex​​cel文件。

https://msdn.microsoft.com/en-us/library/office/gg575571.aspx https://msdn.microsoft.com/en-us/library/office/gg575571.aspx

Can't store values in a DataTable or memory due to a client machine not having enough memory. 由于客户端计算机没有足够的内存,因此无法将值存储在DataTable或内存中。 Trying to read and right away store values into a database: 尝试读取并立即将值存储到数据库中:

// The SAX approach.

static void ReadExcelFileSAX(string fileName)
{
        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
    {

        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

        OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
        string text;
        while (reader.Read())
        {
            if (reader.ElementType == typeof(CellValue))
            {
                text = reader.GetText();
                Console.Write(text + " ");
            }
        }
        Console.WriteLine();
        Console.ReadKey();
    }
}

For example when I read this excel file: 例如,当我阅读此excel文件时:

Test 1
22
345345
345345435
2333
333333
4444
4444444
324324
99999

I get this output: 我得到以下输出:

Blank
22
Blank
345345
Blank 
etc

I have no idea where the blanks are coming from. 我不知道空白来自哪里。 Tried to put the if statement in there testing for blanks then I miss the last value 99999. 试图将if语句放入空白测试中,然后我错过了最后一个值99999。

That reader seems so limited. 那位读者似乎太有限了。 Would really appreciate a suggestion, I mean anything! 非常感谢您的建议,我的意思是什么!

The OpenXmlReader treats the start and end elements as independant items. OpenXmlReader将开始和结束元素视为独立项。 These can be differentiated by checking the IsStartElement and IsEndElement properties. 这些可以通过检查IsStartElementIsEndElement属性来区分。

Your blank values are due to the end elements where GetText returns the empty string. 您的空白值归因于GetText返回空字符串的结尾元素。

You have 2 options to fix it. 您有2个解决方案。 Firstly you could check for IsStartElement in your loop: 首先,您可以在循环中检查IsStartElement

while (reader.Read())
{
    if (reader.ElementType == typeof(CellValue)
        && reader.IsStartElement)
    {
        text = reader.GetText();
        Console.WriteLine(text + " ");
    }
}

Alternatively you can use the LoadCurrentElement method to load the whole element, consuming both the start and end you were getting before: 或者,您可以使用LoadCurrentElement方法加载整个元素,同时消耗之前获得的开始和结束:

while (reader.Read())
{
    if (reader.ElementType == typeof(CellValue))
    {
        CellValue cellVal = (CellValue)reader.LoadCurrentElement();
        Console.WriteLine(cellVal.Text);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM