简体   繁体   English

将大数据查询(超过6万行)导出到Excel

[英]Export a large data query (60k+ rows) to Excel

I created a reporting tool as part of an internal web application. 我创建了一个报告工具,作为内部Web应用程序的一部分。 The report displays all results in a GridView, and I used JavaScript to read the contents of the GridView row-by-row into an Excel object. 该报告将所有结果显示在GridView中,并且我使用JavaScript将GridView的内容逐行读取到Excel对象中。 The JavaScript goes on to create a PivotTable on a different worksheet. JavaScript继续在另一个工作表上创建数据透视表。

Unfortunately I didn't expect that the size of the GridView would cause overloading problems with the browser if more than a few days are returned. 不幸的是,如果返回几天以上,我没想到GridView的大小会导致浏览器超载问题。 The application has a few thousand records per day, let's say 60k per month, and ideally I'd like to be able to return all results for up to a year. 该应用程序每天有数千条记录,比如说每月60k,理想情况下,我希望能够返回长达一年的所有结果。 The number of rows is causing the browser to hang or crash. 行数导致浏览器挂起或崩溃。

We're using ASP.NET 3.5 on Visual Studio 2010 with SQL Server and the expected browser is IE8. 我们正在SQL Server的Visual Studio 2010上使用ASP.NET 3.5,预期的浏览器是IE8。 The report consists of a gridview that gets data from one out of a handful of stored procedures depending on which population the user chooses. 该报告由一个gridview组成,该gridview根据用户选择的人群从少数几个存储过程中获取一个数据。 The gridview is in an UpdatePanel: gridview在UpdatePanel中:

<asp:UpdatePanel ID="update_ResultSet" runat="server">
<Triggers>
    <asp:AsyncPostBackTrigger ControlID="btn_Submit" />
</Triggers>
<ContentTemplate>
<asp:Panel ID="pnl_ResultSet" runat="server" Visible="False">
    <div runat="server" id="div_ResultSummary">
        <p>This Summary Section is Automatically Completed from Code-Behind</p>
    </div>
        <asp:GridView ID="gv_Results" runat="server" 
            HeaderStyle-BackColor="LightSkyBlue" 
            AlternatingRowStyle-BackColor="LightCyan"  
            Width="100%">
        </asp:GridView>
    </div>
</asp:Panel>
</ContentTemplate>
</asp:UpdatePanel>

I was relatively new to my team, so I followed their typical practice of returning the sproc to a DataTable and using that as the DataSource in the code behind: 我对团队来说相对较新,因此我遵循他们的典型做法,将存储过程返回到DataTable并将其用作后面代码中的DataSource:

    List<USP_Report_AreaResult> areaResults = new List<USP_Report_AreaResult>();
    areaResults = db.USP_Report_Area(ddl_Line.Text, ddl_Unit.Text, ddl_Status.Text, ddl_Type.Text, ddl_Subject.Text, minDate, maxDate).ToList();
    dtResults = Common.LINQToDataTable(areaResults);

    if (dtResults.Rows.Count > 0)
    {
        PopulateSummary(ref dtResults);
        gv_Results.DataSource = dtResults;
        gv_Results.DataBind();

(I know what you're thinking! But yes, I have learned much more about parameterization since then.) (我知道您在想什么!但是,是的,自那时以来,我已经学到了更多有关参数化的知识。)

The LINQToDataTable function isn't anything special, just converts a list to a datatable. LINQToDataTable函数没有什么特别的,只是将列表转换为数据表。

With a few thousand records (up to a few days), this works fine. 有几千条记录(最多几天),这可以正常工作。 The GridView displays the results, and there's a button for the user to click which launches the JScript exporter. GridView显示结果,并且有一个供用户单击的按钮,它将启动JScript导出器。 The external JavaScript function reads each row into an Excel sheet, and then uses that to create a PivotTable. 外部JavaScript函数将每一行读入Excel工作表,然后使用该行创建数据透视表。 The PivotTable is important! 数据透视表很重要!

function exportToExcel(sMyGridViewName, sTitleOfReport, sHiddenCols) {
//sMyGridViewName = the name of the grid view, supplied as a text
//sTitleOfReport = Will be used as the page header if the spreadsheet is printed
//sHiddenCols = The columns you want hidden when sent to Excel, separated by semicolon (i.e. 1;3;5).
//              Supply an empty string if all columns are visible.

var oMyGridView = document.getElementById(sMyGridViewName);

//If no data is on the GridView, display alert.
if (oMyGridView == null)
    alert('No data for report');
else {
    var oHid = sHiddenCols.split(";");  //Contains an array of columns to hide, based on the sHiddenCols function parameter
    var oExcel = new ActiveXObject("Excel.Application");
    var oBook = oExcel.Workbooks.Add;
    var oSheet = oBook.Worksheets(1);
    var iRow = 0;
    for (var y = 0; y < oMyGridView.rows.length; y++)
    //Export all non-hidden rows of the HTML table to excel.
    {
        if (oMyGridView.rows[y].style.display == '') {
            var iCol = 0;
            for (var x = 0; x < oMyGridView.rows(y).cells.length; x++) {
                var bHid = false;
                for (iHidCol = 0; iHidCol < oHid.length; iHidCol++) {
                    if (oHid[iHidCol].length !=0 && oHid[iHidCol] == x) {
                        bHid = true;
                        break; 
                    } 
                }
                if (!bHid) {
                    oSheet.Cells(iRow + 1, iCol + 1) = oMyGridView.rows(y).cells(x).innerText;
                    iCol++;
                }
            }
            iRow++;
        }
    }

What I'm trying to do: Create a solution (probably client-side) that can handle this data and process it into Excel. 我正在尝试做的事情:创建一个可以处理此数据并将其处理到Excel中的解决方案(可能是客户端)。 Someone might suggest using the HtmlTextWriter , but afaik that doesn't allow for automatically generating a PivotTable and creates an obnoxious pop-up warning.... 有人可能建议使用HtmlTextWriter ,但是afaik不允许自动生成数据透视表并创建令人讨厌的弹出警告。...

What I've tried: 我尝试过的

  • Populating a JSON object -- I still think this has potential but I haven't found a way of making it work. 填充JSON对象-我仍然认为这有潜力,但是我还没有找到使它起作用的方法。
  • Using a SQLDataSource -- I can't seem to use it to get any data back out. 使用SQLDataSource-我似乎无法使用它来获取任何数据。
  • Paginating and looping through the pages -- Mixed progress. 在页面上分页和循环-进度混合。 Generally ugly though, and I still have the problem that the entire dataset is queried and returned for each page displayed. 虽然通常很难看,但我仍然有一个问题,就是要为显示的每个页面查询并返回整个数据集。

Update: I'm still very open to alternate solutions, but I've been pursuing the JSON theory. 更新:我对替代解决方案仍然很开放,但是我一直在追求JSON理论。 I have a working server-side method that generates the JSON object from a DataTable. 我有一个有效的服务器端方法,该方法从DataTable生成JSON对象。 I can't figure out how to pass that JSON into the (external) exportToExcel JavaScript function.... 我不知道如何将JSON传递到(外部)exportToExcel JavaScript函数中。

    protected static string ConstructReportJSON(ref DataTable dtResults)
    {
        StringBuilder sb = new StringBuilder();
        sb.Append("var sJSON = [");
        for (int r = 0; r < dtResults.Rows.Count; r++)
        {
            sb.Append("{");
            for (int c = 0; c < dtResults.Columns.Count; c++)
            {
                sb.AppendFormat("\"{0}\":\"{1}\",", dtResults.Columns[c].ColumnName, dtResults.Rows[r][c].ToString());
            }
            sb.Remove(sb.Length - 1, 1); //Truncate the trailing comma
            sb.Append("},");
        }
        sb.Remove(sb.Length - 1, 1);
        sb.Append("];");
        return sb.ToString();
    }

Can anybody show an example of how to carry this JSON object into an external JS function? 任何人都可以显示一个如何将此JSON对象带入外部JS函数的示例吗? Or any other solution for the export to Excel. 或任何其他导出到Excel的解决方案。

It's easy and efficient to write CSV files. 编写CSV文件既简单又有效。 However, if you need Excel, it can also be done in a reasonably efficient way, that can handle 60,000+ rows by using the Microsoft Open XML SDK's open XML Writer . 但是, 如果您需要Excel,它也可以以相当有效的方式完成,通过使用Microsoft Open XML SDK的open XML Writer可以处理60,000多行。

  1. Install Microsoft Open SDK if you don't have it already (google "download microsoft open xml sdk") 如果尚未安装Microsoft Open SDK,请安装它(Google“下载microsoft open xml sdk”)
  2. Create a Console App 创建一个控制台应用
  3. Add Reference to DocumentFormat.OpenXml 添加对DocumentFormat.OpenXml的引用
  4. Add Reference to WindowsBase 添加对WindowsBase的引用
  5. Try running some test code like below (will need a few using's) 尝试运行一些如下所示的测试代码(将需要一些使用)

Just Check out Vincent Tan's solution at http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/ ( Below, I cleaned up his example slightly to help new users. ) 只需在以下网址查看Vincent Tan的解决方案即可: http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/ (下面,我稍微整理了一下他的示例以帮助新用户。)

In my own use I found this pretty straight forward with regular data, but I did have to strip out "\\0" characters from my real data. 在我自己的使用中,我发现常规数据非常简单,但是我确实必须从真实数据中去除“ \\ 0”字符。

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

... ...

        using (var workbook = SpreadsheetDocument.Create("SomeLargeFile.xlsx", SpreadsheetDocumentType.Workbook))
        {
            List<OpenXmlAttribute> attributeList;
            OpenXmlWriter writer;

            workbook.AddWorkbookPart();
            WorksheetPart workSheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();

            writer = OpenXmlWriter.Create(workSheetPart);
            writer.WriteStartElement(new Worksheet());
            writer.WriteStartElement(new SheetData());

            for (int i = 1; i <= 50000; ++i)
            {
                attributeList = new List<OpenXmlAttribute>();
                // this is the row index
                attributeList.Add(new OpenXmlAttribute("r", null, i.ToString()));

                writer.WriteStartElement(new Row(), attributeList);

                for (int j = 1; j <= 100; ++j)
                {
                    attributeList = new List<OpenXmlAttribute>();
                    // this is the data type ("t"), with CellValues.String ("str")
                    attributeList.Add(new OpenXmlAttribute("t", null, "str"));

                    // it's suggested you also have the cell reference, but
                    // you'll have to calculate the correct cell reference yourself.
                    // Here's an example:
                    //attributeList.Add(new OpenXmlAttribute("r", null, "A1"));

                    writer.WriteStartElement(new Cell(), attributeList);

                    writer.WriteElement(new CellValue(string.Format("R{0}C{1}", i, j)));

                    // this is for Cell
                    writer.WriteEndElement();
                }

                // this is for Row
                writer.WriteEndElement();
            }

            // this is for SheetData
            writer.WriteEndElement();
            // this is for Worksheet
            writer.WriteEndElement();
            writer.Close();

            writer = OpenXmlWriter.Create(workbook.WorkbookPart);
            writer.WriteStartElement(new Workbook());
            writer.WriteStartElement(new Sheets());

            // you can use object initialisers like this only when the properties
            // are actual properties. SDK classes sometimes have property-like properties
            // but are actually classes. For example, the Cell class has the CellValue
            // "property" but is actually a child class internally.
            // If the properties correspond to actual XML attributes, then you're fine.
            writer.WriteElement(new Sheet()
            {
                Name = "Sheet1",
                SheetId = 1,
                Id = workbook.WorkbookPart.GetIdOfPart(workSheetPart)
            });

            writer.WriteEndElement(); // Write end for WorkSheet Element
            writer.WriteEndElement(); // Write end for WorkBook Element
            writer.Close();

            workbook.Close();
        }

If you review that code you'll notice two major writes, first the Sheet, and then later the workbook that contains the sheet. 如果查看该代码,您会注意到两个主要的内容,首先是工作表,然后是包含工作表的工作簿。 The workbook part is the boring part at the end, the earlier sheet part contains all the rows and columns. 工作簿部分最后是无聊的部分,较早的工作表部分包含所有行和列。

In your own adaptation, you could write real string values into the cells from your own data. 在您自己的修改中,您可以将自己的数据中的实际字符串值写入单元格中。 Instead, above, we're just using the row and column numbering. 相反,在上面,我们仅使用行和列编号。

writer.WriteElement(new CellValue("SomeValue"));

Worth noting, the row numbering in Excel starts at 1 and not 0. Starting rows numbered from an index of zero will lead to "Corrupt file" error messages. 值得注意的是,Excel中的行编号从1开始而不是0。从零索引开始的行编号将导致错误消息“ Corrupt file”。

Lastly, if you're working with very large sets of data, never call ToList() . 最后,如果您要处理大量数据,则永远不要调用ToList() Use a data reader style methodology of streaming the data . 使用数据读取器样式的方法来流传输数据 For example, you could have an IQueryable and utilize it in a for each . 例如,您可以拥有一个IQueryable并将其用于 You never really want to have to rely on having all the data in memory at the same time, or you'll hit an out of memory limitation and/or high memory utilization. 您永远不需要真正依赖于同时将所有数据都存储在内存中,否则您将遇到内存不足的限制和/或较高的内存利用率。

I would try to use displaytag to display the results. 我会尝试使用displaytag显示结果。 You could set it up display a certain number per page, which should solve your overloading issue. 您可以将其设置为每页显示一定数量,这应该可以解决您的超载问题。 Then, you can set displaytag to allow for an Excel export. 然后,您可以设置displaytag以允许Excel导出。

We typically handle this with an "Export" command button which is wired up to a server side method to grab the dataset and convert it to CSV. 我们通常使用“导出”命令按钮来处理此问题,该按钮已连接至服务器端方法以获取数据集并将其转换为CSV。 Then we adjust the response headers and the browser will treat it as a download. 然后我们调整响应头,浏览器会将其视为下载。 I know this is a server side solution, but you may want to consider it since you'll continue having timeout and browser issues until you implement server side record paging. 我知道这是一个服务器端解决方案,但是您可能要考虑一下,因为在实现服务器端记录分页之前,您将继续遇到超时和浏览器问题。

Almost a week and a half since I began this problem, I've finally managed to get it all working to some extent. 自从我开始解决此问题以来,已经有近半个星期的时间了,我终于设法在一定程度上解决了所有问题。 I'll wait temporarily from marking an answer to see if anybody else has a more efficient, better 'best practices' method. 我将暂时暂不标记答案,以查看是否有人有更有效,更好的“最佳做法”方法。

By generating a JSON string, I've divorced the JavaScript from the GridView. 通过生成JSON字符串,我将JavaScript与GridView分离了。 The JSON is generated in code behind when the data is populated: 填充数据时,会在后面的代码中生成JSON:

    protected static string ConstructReportJSON(ref DataTable dtResults)
    {
        StringBuilder sb = new StringBuilder();
        for (int r = 0; r < dtResults.Rows.Count; r++)
        {
            sb.Append("{");
            for (int c = 0; c < dtResults.Columns.Count; c++)
            {
                sb.AppendFormat("\"{0}\":\"{1}\",", dtResults.Columns[c].ColumnName, dtResults.Rows[r][c].ToString());
            }
            sb.Remove(sb.Length - 1, 1); //Truncate the trailing comma
            sb.Append("},");
        }
        sb.Remove(sb.Length - 1, 1);
        return String.Format("[{0}]", sb.ToString());
    }

Returns a string of data such as 返回一串数据,例如

[ {"Caller":"John Doe", "Office":"5555","Type":"Incoming", etc}, [{“ Caller”:“ John Doe”,“ Office”:“ 5555”,“ Type”:“ Incoming”等,

{"Caller":"Jane Doe", "Office":"7777", "Type":"Outgoing", etc}, {etc} ] {“呼叫者”:“简・多伊”,“办公室”:“ 7777”,“类型”:“外出”,等等},{etc}]

I've hidden this string by assigning the text to a Literal in the UpdatePanel using: 我通过使用以下方式将文本分配给UpdatePanel中的Literal来隐藏了此字符串:

    <div id="div_JSON" style="display: none;">
            <asp:Literal id="lit_JSON" runat="server" /> 
    </div>

And the JavaScript parses that output by reading the contents of the div: JavaScript通过读取div的内容来解析输出:

function exportToExcel_Pivot(sMyJSON, sTitleOfReport, sReportPop) {
     //sMyJSON = the name, supplied as a text, of the hidden element that houses the JSON array.
     //sTitleOfReport = Will be used as the page header if the spreadsheet is printed.
     //sReportPop = Determines which business logic to create a pivot table for.

var sJSON = document.getElementById(sMyJSON).innerHTML;
var oJSON = eval("(" + sJSON + ")");

 //    DEBUG Example Test Code
 //    for (x = 0; x < oJSON.length; x++) {
 //        for (y in oJSON[x])
 //            alert(oJSON[x][y]); //DEBUG, returns field value
 //            alert(y); //DEBUG, returns column name
 //    }


//If no data is in the JSON object array, display alert.
if (oJSON == null)
    alert('No data for report');
else {
    var oExcel = new ActiveXObject("Excel.Application");
    var oBook = oExcel.Workbooks.Add;
    var oSheet = oBook.Worksheets(1);
    var oSheet2 = oBook.Worksheets(2);
    var iRow = 0;
    var iCol = 0;

        //Take the column names of the JSON object and prepare them in Excel
        for (header in oJSON[0])
        {
            oSheet.Cells(iRow + 1, iCol + 1) = header;
            iCol++;
        }

        iRow++;

        //Export all rows of the JSON object to excel
        for (var r = 0; r < oJSON.length; r++)
        {
            iCol = 0;
            for (c in oJSON[r]) 
                    {
                        oSheet.Cells(iRow + 1, iCol + 1) = oJSON[r][c];
                        iCol++;
                    } //End column loop
            iRow++;
        } //End row

The string output and the JavaScript 'eval' parsing both work surprisingly fast, but looping through the JSON object is a little slower than I'd like. 字符串输出和JavaScript“ eval”解析都非常快地工作,但是遍历JSON对象比我想要的慢一点。

I believe that this method would be limited to around 1 billion characters of data -- maybe less depending how memory testing works out. 我相信这种方法将限于大约10亿个字符的数据-可能更少,具体取决于内存测试的工作方式。 (I've calculated that I'll probably be looking at a maximum of 1 million characters per day, so that should be fine, within one year of reporting.) (我计算过,每天可能最多查看100万个字符,所以在报告的一年之内就可以了。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM