繁体   English   中英

使用vb.net在大型excel文件上执行SQL查询的最佳方法是什么?

[英]What's the best way to execute SQL Query on large excel file using vb.net?

设置环境:

我正在使用带有.NET framework 4的vb.net开发Excel 2010应用程序级外接程序。


我的目标:

  1. 让用户输入多个名称进行搜索
  2. 使用名称列表在LARGE电子表格(30,000+行)上执行SQL查询
  3. 返回记录集并粘贴到新工作表中

表现是我​​的首要任务。 我想通过利用.NET框架了解最快的方法。

在我的代码中使用ADO连接对象可以工作,但这个过程需要很长时间(5 - 8秒)。


这是我在名为wells的表上使用的SQL查询:

    SELECT * 
    FROM wells 
    WHERE padgroup in 

    (SELECT padgroup 
     FROM wells 
     WHERE name LIKE 'TOMCHUCK 21-30'
             OR name LIKE 'FEDERAL 41-25PH')


这是表格的一部分:

Excel表


我现在正在使用此代码创建一个ADO连接对象来检索我的结果:

    'Create Recordset Object
    rsCon = CreateObject("ADODB.Connection")
    rsData = CreateObject("ADODB.Recordset")

    rsCon.Open(szConnect)
    rsData.Open(mySQLQueryToExecute, rsCon, 0, 1, 1)

    'Check to make sure data is received, then copy the data
    If Not rsData.EOF Then

        TargetRange.Cells(1, 1).CopyFromRecordset(rsData)

    Else

        MsgBox("No records returned from : " & SourceFile, vbCritical)

    End If

    'Clean up the Recordset object
    rsData.Close()
    rsData = Nothing
    rsCon.Close()
    rsCon = Nothing


根据我的了解,Excel电子表格以Open XML格式存储,.NET框架包含解析XML的本机支持。

在研究之后,我遇到了几个不同的选择:


有人可以提供一个指针,指出最好的方法吗? 我真的很感激。


补充笔记:

  • 所有查询都需要能够在不连接到在线数据库的情况下执行
  • 我只需要访问一次电子表格就可以从行中提取原始数据


现在我只是将电子表格作为项目资源嵌入。

然后,在运行时我创建文件,运行查询,将结果存储在内存中,并删除文件。

   'Create temp file path in the commonapplicationdata folder
    Dim excelsheetpath As StringBuilder

    excelsheetpath = New StringBuilder(Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData))

    excelsheetpath.Append("\MasterList.xlsm")

    'Save resources into temp location in HD
    System.IO.File.WriteAllBytes(excelsheetpath.ToString, My.Resources.MasterList)

    'Now call the function to use ADO to get records from the MasterList.xlsm file just created
    GetData(excelsheetpath.ToString, "Sheet1", "A1:S40000", True, False)

    'Store the results in-memory and display by adding to a datagridview control (in a custom task pane)

    'Delete the spreadsheet
    System.IO.File.Delete(excelsheetpath.ToString())

你正在以错误的方式做VSTO;)不要在Excel中使用SQL。 如果您需要速度,请利用VSTO和本机Excel API。 您可以跳过ADODB / OLEDB图层的开销,直接使用Excel对象模型在Excel中使用超快速自动过滤器,使用SpecialCells方法将可见单元格放入多区域范围,并使用Value方法快速复制范围到数组。

以下是VSTO 2010自定义工作簿示例,可快速搜索包含“aba”,“cat”或“zon”的单词的58k单词列表

using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Xml.Linq;
using Microsoft.Office.Tools.Excel;
using Microsoft.VisualStudio.Tools.Applications.Runtime;
using Excel = Microsoft.Office.Interop.Excel;
using Office = Microsoft.Office.Core;

namespace ExcelWorkbook1
{
    public partial class ThisWorkbook
    {
        private void ThisWorkbook_Startup(object sender, System.EventArgs e)
        {
            const int Sheet1 = 1; // you can use Linq to find a sheet by name if needed
            const int ColumnB = 2;
            List<List<object>> results = Query(Sheet1, ColumnB, "aba", "cat", "zon");

            foreach (List<object> record in results)
            {
                System.Diagnostics.Debug.Print("{0,-10} {1,30} {2}", record[0], record[1], record[2]);
            }
        }

        private void ThisWorkbook_Shutdown(object sender, System.EventArgs e)
        {
        }

        /// <summary>
        ///     Removes any existing Excel autofilters from the worksheet
        /// </summary>
        private void ClearFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet)
        {
            if (worksheet.AutoFilter != null)
            {
                worksheet.Cells.AutoFilter();
            }
        }

        /// <summary>
        ///     Applies an Excel Autofilter to the worksheet for search for an array of substring predicates
        /// </summary>
        private void ApplyFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet, int column, params string[] predicates)
        {
            string[] criteria = new string[predicates.Length];
            int i = 0;

            ClearFilter(worksheet);

            foreach (string value in predicates)
            {
                criteria[i++] = String.Concat("=*", value, "*");
            }

            worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr); 
        }

        /// <summary>
        ///     Returns a list of rows that are hits on a search for an array of substrings in Column B of Sheet1
        /// </summary>
        private List<List<object>> Query(int sheetIndex, int columnIndex, params string[] words)
        {
            Microsoft.Office.Interop.Excel._Worksheet worksheet;
            Excel.Range range;
            List<List<object>> records = new List<List<object>>();
            List<object> record;
            object[,] cells;
            object value;
            int row, column, rows, columns;
            bool hit;

            try
            {
                worksheet = (Microsoft.Office.Interop.Excel._Worksheet)Globals.ThisWorkbook.Sheets[sheetIndex];
                if (null == worksheet)
                {
                    return null;
                }

                // apply the autofilter
                ApplyFilter(worksheet, columnIndex, words);

                // get the 
                range = worksheet.Range["$A:$C"].SpecialCells(Excel.XlCellType.xlCellTypeVisible);
                foreach (Excel.Range subrange in range.Areas)
                {
                    // copy the cells to a multidimensional array for perfomance
                    cells = subrange.Value;

                    // transform the multidimensional array to a List
                    for (row = cells.GetLowerBound(0), rows = cells.GetUpperBound(0); row <= rows; row++)
                    {
                        record = new List<object>();
                        hit = false;

                        for (column = cells.GetLowerBound(1), columns = cells.GetUpperBound(1); column <= columns; column++)
                        {
                            value = cells[row, column];
                            hit = hit || (null != value);

                            if (hit)
                            {
                                record.Add(cells[row, column]);
                            }
                        }

                        if (hit)
                        {
                            records.Add(record);
                        }
                    }
                }
            }
            catch { }
            finally
            {
                // use GC.Collect over Marshal.ReleaseComObject() to release all RCWs per http://stackoverflow.com/a/17131389/1995977 and more
                cells = null;
                GC.Collect();
                GC.WaitForPendingFinalizers();
            }

            return records;
        }

        #region VSTO Designer generated code

        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InternalStartup()
        {
            this.Startup += new System.EventHandler(ThisWorkbook_Startup);
            this.Shutdown += new System.EventHandler(ThisWorkbook_Shutdown);
        }

        #endregion

    }
}

Excel 2010文件不是完全XML。 获取XLSX(或XMSM)文件,并使用.zip扩展名重命名该文件。 然后将其解压缩到一个新文件夹。 子文件夹中的文件将是XML文件,是的,但实际的XLSX文件是一个zip文件,其中包含包含XML文件的文件夹集合。

我认为,您最好的选择是使用ACE驱动程序(不再支持JET)并通过ODBC访问它。 如果这还不够快,您可以在特定时间提取摘录并将其上传到可以运行查询的数据库; 查询应该更快,但可能会过时。

我的解决方案

我尝试了三种不同的方法:

  • 带SQL的ADO连接对象(最慢)
  • VSTO和Excel的Autofilter(可靠)
  • LINQ to XML(最快)

LINQ to XML提供了最佳性能。 我将我的表转换为XML文件:

XML表


然后,在我的代码中,我使用StringReader引入XMLwellData文件(将其保存为项目资源)。

    'welldoc will be the file to do queries on using LINQ to XML
    Dim stream As System.IO.StringReader
    stream = New StringReader(My.Resources.XMLwellData)

    welldoc = XDocument.Load(stream)

    'clean up stream now that it's no longer needed
    stream.Close()
    stream.Dispose()


    '***** later in the code perform my query on XML file *********

        Dim query = _
     From well In welldoc.<wellList>.<well> _
     Where well.<name>.Value Like "TOMCHUCK 21-30" _
     Select well

     For Each well in query

        MessageBox.Show(well.<padgroup>.value)

     Next

做我想做的事情简直太简单了,最棒的是它很快。

感谢您的所有帮助和建议。 这对我来说很有意义。


使用Excel的自动过滤器的替代方法


如果您尝试使用其他答案中建议的代码,则只会过滤两个值:

    worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr);


因此,要使用Excel的Auotfilter过滤多个条件 ,必须将参数作为数组传递并过滤xlFilterValues。


    Dim wrkbk As Excel.Workbook
    Dim wrksht As Excel.Worksheet
    Dim myRange As Excel.Range
    Dim cell As Excel.Range

    'You would add all of your wellnames to search to this List
    Dim wellNames As New List(Of String)

    wrksht = wrkbk.Sheets(1)

    'In my excel file there is a Named Range which includes the all the information
    myRange = wrksht.Range("WellLookUpTable")

    'Notice, for `Criteria1:=` you MUST convert the List to an array
    With wrksht.Range("WellLookUpTable")
        .AutoFilter(Field:=2, Criteria1:=wellNames.ToArray, Operator:=Excel.XlAutoFilterOperator.xlFilterValues)
    End With

    myRange = wrksht.Range("A2", wrksht.Range("A2").End(Excel.XlDirection.xlDown)).Cells.SpecialCells(Excel.XlCellType.xlCellTypeVisible)

    For Each cell In myRange

        'column 11 is padgroup
        MessageBox.Show(cell.Offset(0, 11).Value)

    Next

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM