简体   繁体   中英

Read large excel sheet using Interop in C#

I know my question may seem common. There are lot of knowledge shared out there in this forum itself. But I'm unable to get solution for my particular requirement.

In my case I have an excel workbook(ver 2016) contains 10 sheets with 1500 rows in each. Column may vary from 15 to 20 in each sheets. I want to read all the data from all the sheets and validate their datatype and insert it into a sql server database table.

But I've tried with 2 sheets with 100 rows in each.

Excel.Worksheet ofWs;
Excel.Range range;
ofWs = (Excel.Worksheet)ofShe.get_Item("Sales");
range = ofWs.UsedRange;
for (int i = 2; i <= range.Rows.Count; i++)
{
    string var1 = "";
    double var2 = 0;

    //validation for column1
    if ((ofWs.Cells[i, 1] as Excel.Range).Value2 != null)
    {
        if ((ofWs.Cells[i, 1] as Excel.Range).Value2.GetType().ToString() == "System.String")
            var1 = (string)(ofWs.Cells[i, 1] as Excel.Range).Value2;
        else
        {
            sale_comm_column += "COLUMN A, ";
            sale_errFlag = false;
        }
    }
    else
    {
        sale_comm_column += "COLUMN A, ";
        sale_errFlag = false;
    }

    //validation for column2
    if ((ofWs.Cells[i, 2] as Excel.Range).Value2 != null)
    {
        if ((ofWs.Cells[i, 2] as Excel.Range).Value2.GetType().ToString() == "System.Double")
            var2 = (double)(ofWs.Cells[i, 2] as Excel.Range).Value2;
        else
        {
            sale_comm_column += "COLUMN B, ";
            sale_errFlag = false;
        }
    }
    else
    {
        sale_comm_column += "COLUMN B, ";
        sale_errFlag = false;
    }

}

This for loop will thru all the rows and I'm validating each column in "if" statement. Here I've shown the validation part of 2 columns of 1st sheet only. For 100 rows itself it's taking too much time. However if I remove all these "if", it's taking less time. If I want to try this on my actual requirement of 10 sheets with 1500 rows in each, what is the best way accomplish this??

Your method would take a lot of times because every time you update a cell, a RPC call will be made for the excel instance.

Considering that your excel file format is .xlsx

I would recommend you the following :

  1. If you open an excel file with Hex-editor you will notice that the file signature is PK(zip file format), meaning that it is basically zipped XML files.
  2. Unzip the excel file then inside '\\xl\\worksheets' folder you will see 'sheet[1~10].xml' files
  3. write codes that read/validate the XML files and insert it into the database.

The above process could easily be automated and should be much faster than using excel interop.

This is primarily a speed question, so the speed rant is worth reading. You can skip Part 1.

Keep DB Operations in the DB

As you are inserting this stuff into a DB, you should propably be doing that in the DB. Every DBMS worth it's Diskspace, will have a option to do Bulk Inserts. You are not going to beat that one by doing it in the client. That will only add the need to transmit the data over the network.

CSV support is guaranteed, and Excel format very common. When in doubt, you can save Excel files as CSV, if you do not need the Formating and type hints. You might need to do some parsing then, however.

OpenXML vs Office Interop

There are 3 options to work with Office Formats in .NET:

  1. If you only need the new Formats (.xlsx), use the OpenXML SDK . Or any of the wrappers people made around it. Or even just the ZipArchive and XMLReader classes - It is open Format, based around having a bunch of XML files in a .zip container.
  2. If you need to also support the old formats (.xls), you have to use the (t)rusty Office Interop. That one has all the usual issues of COM interop, except it also needs the programm installed and requires a interactive session. Meaning it can not run from a Service, including most WebServers stared as one.
  3. For any given problem, any given language and any given format there might be a 3rd way. But those are few and far appart. I even count the DBMS way (if avalible) to it.

While I have done no measurements, I am willing to bet real money that OpenXML beats OfficeInterop in speed. One is just doing basic file operations with XML parsing and Zip File decompression. The other has the overheads of COM Interop and Remoting a invisible Office Instance for the work. It is not even a question who would win in speed. The only question is if it is fast enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM