简体   繁体   English

Microsoft.Office.Interop.Excel或EPPlus,用于读取巨大(或不巨大)的Excel文件

[英]Microsoft.Office.Interop.Excel or EPPlus for read a huge (or not) Excel file

I wrote a code to read a column from a Excel file. 我编写了一个代码,以从Excel文件中读取一列。 I use Microsoft.Office.Interop.Excel on this, first read the entire Range and then write in System.Array after that I do some operations with the System.Array values and finally I convert it to List because I fill a ListBox element. 我在此上使用Microsoft.Office.Interop.Excel,首先读取整个Range,然后在System.Array中写入,然后对System.Array值进行一些操作,最后将其转换为List,因为我填充了ListBox元素。 This is the code (only relevant parts): 这是代码(仅相关部分):

private List<string> bd = new List<string>();
private static System.Array objRowAValues;

private List<string> bl = new List<string>();
private static System.Array objRowBValues;

private List<string> cm = new List<string>();
private static System.Array objRowCValues;

private List<string> pl = new List<string>();
private List<string> bdCleanList;
private static Microsoft.Office.Interop.Excel.Application appExcel;

Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range rngARowLast, rngBRowLast, rngCRowLast;

long lastACell, lastBCell, lastCCell, fullRow;

private void btnCargarExcel_Click(object sender, EventArgs e)
    {
        if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
        {
            if (System.IO.File.Exists(openFileDialog1.FileName))
            {
                Stopwatch stopWatch = new Stopwatch();
                stopWatch.Start();
                Thread.Sleep(10000);

                filePath.Text = openFileDialog1.FileName.ToString();

                xlApp = new Microsoft.Office.Interop.Excel.Application();
                xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true,
                                                  Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false,
                                                  false, 0, true, 1, 0);
                xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                fullRow = xlWorkSheet.Rows.Count;
                lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
                rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
                objRowAValues = (System.Array)rngARowLast.Cells.Value;

                foreach (object elem in objRowAValues)
                {
                    if (elem != "")
                    {
                        bd.Add(cleanString(elem.ToString(), 10));
                    }
                }

                nrosProcesados.Text = bd.Count().ToString();
                listBox1.DataSource = bd;

                xlWorkBook.Close(true, null, null);
                xlApp.Quit();

                releaseObject(xlWorkSheet);
                releaseObject(xlWorkBook);
                releaseObject(xlApp);

                stopWatch.Stop();

                TimeSpan ts = stopWatch.Elapsed;
                executiontime.Text =
                    String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds,
                                  ts.Milliseconds / 10).ToString();
            }
            else
            {
                MessageBox.Show("No se pudo abrir el fichero!");
                System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
                appExcel = null;
                System.Windows.Forms.Application.Exit();
            }
        }
    }

I test with a Excel file with ~800 000 cells and take less than 2 minutes. 我使用一个包含约80万个单元格的Excel文件进​​行测试,并且用时不到2分钟。 Then I test samples from EPPlus and are faster than my approach so I think in use EPPlus instead of Microsoft.Office.Interop.Excel I think also in use OpenXML SDK (but can't find any example to suite my goals so I leave for now). 然后,我测试了来自EPPlus的示例,并且比我的方法要快,因此我认为使用EPPlus代替Microsoft.Office.Interop.Excel也可以使用OpenXML SDK(但找不到适合我目标的任何示例,因此我将其保留下来)现在)。 In the example they use this code to read from a Excel file: 在示例中,他们使用以下代码从Excel文件读取:

ExcelWorksheet sheet = package.Workbook.Worksheets[1];

var query1= (from cell in sheet.Cells["d:d"] where cell.Value is double && (double)cell.Value >= 9990 && (double)cell.Value <= 10000 select cell);

of course they use LINQ here, but my questions regarding this topic are: 他们当然在这里使用LINQ,但是我对这个主题的疑问是:

  • Which approach did yours use? 您使用哪种方法?
  • What are your recommendations on this? 您对此有何建议?
  • Any help to write the same using EPPlus or OpenXML SDK? 使用EPPlus或OpenXML SDK编写任何帮助吗?

I'm newbie in C# world coming from PHP world and this is my first project 我是来自PHP世界的C#世界的新手,这是我的第一个项目

Which approach did yours use? 您使用哪种方法? -EPPlus -EPPlus

What are your recommendations on this? 您对此有何建议? -I've found EPPLus to be hugely faster. -我发现EPPLus的速度要快得多。 It is also an easier API to work with in my opinion. 我认为,它也是一种更易于使用的API。 For many reasons, one being the lack of COM interop(both for speed and ease of use). 由于许多原因,一个原因是缺乏COM互操作性(兼顾速度和易用性)。 Also has less requirements, especially when deploying to a server environment: no installing Excel junk. 也有较少的要求,尤其是在部署到服务器环境时:无需安装Excel垃圾。

Any help to write the same using EPPlus or OpenXML SDK? 使用EPPlus或OpenXML SDK编写任何帮助吗? -EPPlus API is fairly straightfoward. -EPPlus API非常简单。 Make an attempt and post more specific questions with what you've tried so far. 尝试一下,并发布到目前为止您尝试过的更具体的问题。

Another way to loop through cells: 循环遍历单元格的另一种方法:

var firstColumnRows = sheet.Cells["A2:A"];

// Loop through rows in the first column, get values based on offset
foreach (var cell in firstColumnRows)
{
    var column1CellValue = cell.GetValue<string>();
    var neighborCellValue = cell.Offset(0, 1).GetValue<string>();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM