简体   繁体   English

如何在数据表 c# 中按需加载 excel 行

[英]How to load on demand excel rows in a data table c#

I have a requirement where-in I have to fill dataTable from a sheet of Microsoft excel.我有一个要求,我必须从一张 Microsoft excel 中填写数据表。

The sheet may have lots of data so the requirement is that when a foreach loop is iterated over the data table which is supposed to hold the data from Microsoft excel sheet should fill the table on demand.工作表可能有很多数据,因此要求是,当对应该保存 Microsoft excel 工作表中的数据的数据表进行 foreach 循环迭代时,应按需填充表。

Meaning if there are 1000000 records in the sheet the data table should fetch data in batches of 100 depending on the current position of the foreach current item in the loop.这意味着如果工作表中有 1000000 条记录,则数据表应根据循环中 foreach 当前项的当前 position 分批获取 100 条数据。

Any pointer or suggestion will be appreciated.任何指针或建议将不胜感激。

I would suggest you to use OpenXML to parse and read your excel data from file.我建议您使用 OpenXML 从文件中解析和读取 excel 数据。 This will also allow you to read out specific sections/regions from your workbook.这也将允许您从工作簿中读出特定的部分/区域。

You will find more information and also an example at this link: Microsoft Docs - Parse and read a large spreadsheet document (Open XML SDK)您将在此链接中找到更多信息和示例: Microsoft Docs - Parse and read a large spreadsheet document (Open XML SDK)

This will be more efficiently and easier to develop than use the official microsoft office excel interop.这将比使用官方的 microsoft office excel interop 更高效、更容易开发。

**I am not near a PC with Visual stuido, so this code is untested, and may have syntax errors until I can test it later. **我不在装有 Visual stuido 的 PC 附近,因此此代码未经测试,在我稍后测试之前可能存在语法错误。

It will still give you the main idea of what needs to be done.它仍然会给你关于需要做什么的主要想法。

private void ExcelDataPages(int firstRecord, int numberOfRecords)
{
    
    Excel.Application dataApp = new Excel.Application(); 
    Excel.Workbook dataWorkbook = new Excel.Workbook();
    int x = 0;
    
    dataWorkbook.DisplayAlerts = false;
    dataWorkbook.Visible = false;
    dataWorkbook.AutomationSecurity = Microsoft.Office.Core.MsoAutomationSecurity.msoAutomationSecurityLow;
    dataWorkbook = dataApp.Open(@"C:\Test\YourWorkbook.xlsx");
    
    try
    {
        Excel.Worksheet dataSheet = dataWorkbook.Sheet("Name of Sheet");
        
        while (x < numberOfRecords)
        {
            Range currentRange = dataSheet.Rows[firstRecord + x]; //For all columns in row 
    

            foreach (Range r in currentRange.Cells) //currentRange represents all the columns in the row
            {
                // do what you need to with the Data here.
            }
             x++;
        }
    }
    catch (Exception ex)
    {
        //Enter in Error handling
    }

    dataWorkbook.Close(false); //Depending on how quick you will access the next batch of data, you may not want to close the Workbook, reducing load time each time.  This may also mean you need to move the open of the workbook to a higher level in your class, or if this is the main process of the app, make it static, stopping the garbage collector from destroying the connection.
    dataApp.Quit();

}

Give the following a try--it uses NuGet package DocumentFormat.OpenXml The code is from Using OpenXmlReader .试一试——它使用 NuGet package DocumentFormat.OpenXml代码来自Using OpenXmlReader However, I modified it to add data to a DataTable.但是,我对其进行了修改以将数据添加到 DataTable。 Since you're reading data from the same Excel file multiple times, it's faster to open the Excel file once using an instance of SpreadSheetDocument and dispose of it when finished.由于您要多次从同一个 Excel 文件中读取数据,因此使用 SpreadSheetDocument 实例打开一次 Excel 文件并在完成后处理它会更快。 Since the instance of SpreedSheetDocument needs to be disposed of before your application exits, IDisposable is used.由于 SpreedSheetDocument 的实例需要在您的应用程序退出之前被处理掉,因此使用了IDisposable

Where it says "ToDo", you'll need to replace the code that creates the DataTable columns with your own code to create the correct columns for your project.在它显示“ToDo”的地方,您需要用您自己的代码替换创建 DataTable 列的代码,以便为您的项目创建正确的列。

I tested the code below with an Excel file containing approximately 15,000 rows.我使用包含大约 15,000 行的 Excel 文件测试了下面的代码。 When reading 100 rows at a time, the first read took approximately 500 ms - 800 ms, whereas subsequent reads took approximately 100 ms - 400 ms.一次读取 100 行时,第一次读取大约需要 500 毫秒 - 800 毫秒,而后续读取大约需要 100 毫秒 - 400 毫秒。

Create a class (name: HelperOpenXml)创建一个 class(名称:HelperOpenXml)

HelperOpenXml.cs HelperOpenXml.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Data;
using System.Diagnostics;

namespace ExcelReadSpecifiedRowsUsingOpenXml
{
    public class HelperOpenXml : IDisposable
    {
        public string Filename { get; private set; } = string.Empty;
        public int RowCount { get; private set; } = 0;

        private SpreadsheetDocument spreadsheetDocument = null;

        private DataTable dt = null;
        

        public HelperOpenXml(string filename)
        {
            this.Filename = filename;
        }

        public void Dispose()
        {
            if (spreadsheetDocument != null)
            {
                try
                {
                    spreadsheetDocument.Dispose();
                    dt.Clear();
                }
                catch(Exception ex)
                {
                    throw ex;
                }
            }
        }

        public DataTable GetRowsSax(int startRow, int endRow, bool firstRowIsHeader = false)
        {
            int startIndex = startRow;
            int endIndex = endRow;

            if (firstRowIsHeader)
            {
                //if first row is header, increment by 1
                startIndex = startRow + 1;
                endIndex = endRow + 1;
            }

            if (spreadsheetDocument == null)
            {
                //create new instance
                spreadsheetDocument = SpreadsheetDocument.Open(Filename, false);

                //create new instance
                dt = new DataTable();

                //ToDo: replace 'dt.Columns.Add(...)' below with your code to create the DataTable columns
                //add columns to DataTable
                dt.Columns.Add("A");
                dt.Columns.Add("B");
                dt.Columns.Add("C");
                dt.Columns.Add("D");
                dt.Columns.Add("E");
                dt.Columns.Add("F");
                dt.Columns.Add("G");
                dt.Columns.Add("H");
                dt.Columns.Add("I");
                dt.Columns.Add("J");
                dt.Columns.Add("K");

            }
            else
            {
                //remove existing data from DataTable
                dt.Rows.Clear(); 

            }

            WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;

            int numWorkSheetParts = 0;

            foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
            {
                using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart))
                {
                    int rowIndex = 0;

                    //use the reader to read the XML
                    while (reader.Read())
                    {
                        if (reader.ElementType == typeof(Row))
                        {
                            reader.ReadFirstChild();

                            List<string> cValues = new List<string>();
                            int colIndex = 0;
                            do
                            {
                                //only get data from desired rows
                                if ((rowIndex > 0 && rowIndex >= startIndex && rowIndex <= endIndex) ||
                                (rowIndex == 0 && !firstRowIsHeader && rowIndex >= startIndex && rowIndex <= endIndex))
                                {

                                    if (reader.ElementType == typeof(Cell))
                                    {
                                        Cell c = (Cell)reader.LoadCurrentElement();

                                        string cellRef = c.CellReference; //ex: A1, B1, ..., A2, B2

                                        string cellValue = string.Empty;

                                        //string/text data is stored in SharedString
                                        if (c.DataType != null && c.DataType == CellValues.SharedString)
                                        {
                                            SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));

                                            cellValue = ssi.Text.Text;
                                        }
                                        else
                                        {
                                            cellValue = c.CellValue.InnerText;
                                        }

                                        //Debug.WriteLine("{0}: {1} ", c.CellReference, cellValue);

                                        //add value to List which is used to add a row to the DataTable
                                        cValues.Add(cellValue);
                                    }
                                }

                                colIndex += 1; //increment

                            } while (reader.ReadNextSibling());

                            if (cValues.Count > 0)
                            {
                                //if List contains data, use it to add row to DataTable
                                dt.Rows.Add(cValues.ToArray()); 
                            }

                            rowIndex += 1; //increment

                            if (rowIndex > endIndex)
                            {
                                break; //exit loop
                            }
                        }
                    }
                }

                numWorkSheetParts += 1; //increment
            }

            DisplayDataTableData(dt); //display data in DataTable

            return dt;
        }

        
        private void DisplayDataTableData(DataTable dt)
        {
            foreach (DataColumn dc in dt.Columns)
            {
                Debug.WriteLine("colName: " + dc.ColumnName);
            }

            foreach (DataRow r in dt.Rows)
            {
                Debug.WriteLine(r[0].ToString() + " " + r[1].ToString());
            }
        }

    }
}

Usage :用法

private string excelFilename = @"C:\Temp\Test.xlsx";
private HelperOpenXml helperOpenXml = null;

            ...

private void GetData(int startIndex, int endIndex, bool firstRowIsHeader)
{
    helperOpenXml.GetRowsSax(startIndex, endIndex, firstRowIsHeader);
}

Note : Make sure to call Dispose() (ex: helperOpenXml.Dispose(); ) before your application exits.注意:确保在您的应用程序退出之前调用Dispose() (例如: helperOpenXml.Dispose(); )。

Update :更新

OpenXML stores dates as the number of days since 01 Jan 1900. For dates prior to 01 Jan 1900, they are stored in SharedString. OpenXML 将日期存储为自 1900 年 1 月 1 日以来的天数。对于 1900 年 1 月 1 日之前的日期,它们存储在 SharedString 中。 For more info see Reading a date from xlsx using open xml sdk有关更多信息,请参阅使用打开的 xml sdk 从 xlsx 读取日期

Here's a code snippet:这是一个代码片段:

Cell c = (Cell)reader.LoadCurrentElement();
             ...
string cellValue = string.Empty
             ...
cellValue = c.CellValue.InnerText;

double dateCellValue = 0;
Double.TryParse(cellValue, out dateCellValue);

DateTime dt = DateTime.FromOADate(dateCellValue);

cellValue = dt.ToString("yyyy/MM/dd");

I use this code with EPPlus DLL, Don't forget to add reference.我将此代码与 EPPlus DLL 一起使用,不要忘记添加参考。 But should check to match with your requirement.但应检查是否符合您的要求。

public DataTable ReadExcelDatatable(bool hasHeader = true)
{
    using (var pck = new OfficeOpenXml.ExcelPackage())
    {
        using (var stream = File.OpenRead(this._fullPath))
        {
            pck.Load(stream);
        }

        var ws = pck.Workbook.Worksheets.First();

        DataTable tbl = new DataTable();

        int i = 1;
        foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
        {
            //table head
            tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));

            tbl.Columns.Add(_tableHead[i]);
            i++;
        }

        var startRow = hasHeader ? 2 : 1;
        
        for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
        {
            var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
            DataRow row = tbl.Rows.Add();
            foreach (var cell in wsRow)
            {
                row[cell.Start.Column - 1] = cell.Text;
            }
        }

        return tbl;
    }
}

Another simple alternative is this: Take a look at the NUGET package ExcelDataReader , with additional information on https://github.com/ExcelDataReader/ExcelDataReader另一个简单的替代方法是:查看 NUGET package ExcelDataReader以及有关https: //github/DataReader/ExcelDataReader 的其他信息

Usage example:使用示例:

[Fact] 
void Test_ExcelDataReader() 
{
    
    System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
    var scriptPath = Path.GetDirectoryName(Util.CurrentQueryPath); // LinqPad script path
    var filePath = $@"{scriptPath}\TestExcel.xlsx";
    using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
    {
        // Auto-detect format, supports:
        //  - Binary Excel files (2.0-2003 format; *.xls)
        //  - OpenXml Excel files (2007 format; *.xlsx, *.xlsb)
        using (var reader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream))
        {
            var result = reader.AsDataSet();
            // The result of each spreadsheet is in result.Tables
            var t0 = result.Tables[0];
            Assert.True(t0.Rows[0][0].Dump("R0C0").ToString()=="Hello", "Expected 'Hello'");
            Assert.True(t0.Rows[0][1].Dump("R0C1").ToString()=="World!", "Expected 'World!'");          
        } // using
    } // using
} // fact

Before you start reading, you need to set and encoding provider as follows:在开始阅读之前,您需要按如下方式设置和编码提供程序:

 System.Text.Encoding.RegisterProvider(
      System.Text.CodePagesEncodingProvider.Instance);

The cells are addressed the following way:单元格按以下方式寻址:

 var t0 = result.Tables[0]; // table 0 is the first worksheet
 var cell = t0.Rows[0][0];  // on table t0, read cell row 0 column 0

And you can easily loop through the rows and columns in a for loop as follows:您可以轻松地循环遍历for循环中的行和列,如下所示:

for (int r = 0; r < t0.Rows.Count; r++)
{
    var row = t0.Rows[r];
    var columns = row.ItemArray;
    for (int c = 0; c < columns.Length; c++)
    {
        var cell = columns[c];
        cell.Dump();
    }
}

I'm going to give you a different answer.我要给你一个不同的答案。 If the performance is bad loading a million rows into a DataTable resort to using a Driver to load the data: How to open a huge excel file efficiently如果将一百万行加载到数据表中性能不佳,请使用驱动程序加载数据:如何有效打开巨大的 excel 文件

DataSet excelDataSet = new DataSet();

string filePath = @"c:\temp\BigBook.xlsx";

// For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with  "';Extended Properties=\"Excel 8.0;HDR=YES;\"";
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties=\"Excel 12.0;HDR=YES;\"";

using (OleDbConnection conn = new OleDbConnection(connectionString))
{
    conn.Open();
    OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
    ("select * from [Sheet1$]", conn);
    objDA.Fill(excelDataSet);
    //dataGridView1.DataSource = excelDataSet.Tables[0];
}

Next filter the DataSet's DataTable using a DataView.接下来使用 DataView 过滤 DataSet 的 DataTable。 Using a DataView's RowFilter property you can specify subsets of rows based on their column values.使用 DataView 的 RowFilter 属性,您可以根据列值指定行的子集。

DataView prodView = new DataView(excelDataSet.Tables[0],  
"UnitsInStock <= ReorderLevel",  
"SupplierID, ProductName",  
DataViewRowState.CurrentRows); 

Ref: https://www.c-sharpcorner.com/article/dataview-in-C-Sharp/参考: https://www.c-sharpcorner.com/article/dataview-in-C-Sharp/

Or you could use the DataTables' DefaultView RowFilter directly:或者您可以直接使用 DataTables 的 DefaultView RowFilter:

excelDataSet.Tables[0].DefaultView.RowFilter = "Amount >= 5000 and Amount <= 5999 and Name = 'StackOverflow'";

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM