數據透視表聚合的Excel性能

Question

在我使用Excel時，我總是對Excel執行以下兩個聚合操作的效果感到驚訝：

日期/時間聚合。
不區分大小寫的聚合。

Excel如何實現這一性能？ 他們是否存儲與透視相關的信息和聚合的其他數據結構？ 這是在任何地方記錄的，或者我可以在哪里找到更多相關信息？ 我查看了Libreoffice源代碼，但實際產品在聚合/數據透視性能方面甚至不接近Excel。

如果了解Excel的人可以分享更多關於Excel用於實現此性能的低級聚合行為或結構，那將是很好的 - 例如，他們是否存儲了兩次標簽 - 一次是在本機情況下，一次為聚合目的而降低？ 雖然我知道這個問題過於廣泛，而不是代碼答案本身，而且更具概念性，但我希望答案可以作為優化excel風格聚合性能的方法的良好參考。

根據ARGeo的一些建議，我注意到以下幾點 -

（1）有兩個與Pivot Cache相關的文件 - Definitions（字段級信息）：

（2）和記錄（行/單元格級別信息） -

那時的幾個問題：

Excel如何確定何時將值存儲為何時將其存儲為共享記錄。 例如，為什么B2中的值，“LifeLock”（一個混合大小寫的字符串）按原樣存儲，但F2中的值“AZ”存儲為sharedItems（v =“0”）？
是否有關於內部C / C ++ Struct ，Excel在其內存中使用的是pivotCache（而不是存儲的各種XML文檔）？
是否有關於如何在Excel內部使用存儲在字段級別的“幫助信息”的信息？ 例如，這個信息：

。

<cacheField name="numEmps" numFmtId="0"><sharedItems containsString="0" containsBlank="1" containsNumber="1" containsInteger="1" minValue="0" maxValue="20000"/></cacheField>

Answer 1

數據透視表性能基於數據透視表Pivot Cache 。 雖然關於這個主題的信息很少（我的意思是缺乏官方文檔），但我發現了一些有趣的帖子和MS文檔。

定義：

Pivot Cache是一個特殊的內存區域，可以保存數據透視表記錄 。

創建Pivot Table ，Excel會獲取源數據的副本並將其存儲在Pivot Cache 。 Pivot Cache保存在Excel的內存中。 您無法看到它，但這是數據透視表在構建數據透視表時引用的數據。

This enables Excel to be very responsive to changes in the Pivot Table but it can also double the size of your file 。 畢竟，Pivot Cache只是源數據的副本，因此您的文件大小可能會翻倍。

請使用此鏈接和此鏈接獲取更多信息作為起始參考點。

此外，您可以閱讀Excel 101和Excel Pivot Cache 101帖子中的Pivot Cache，以了解它是什么以及它有什么副作用。

下面是一些VB代碼片段以及如何使用PivotCache對象的示例。

這是用C＃編寫的代碼，它允許您創建帶有一些Pivot Tables的Excel工作簿，當然，這些Pivot Tables使用Pivot Cache Pivot Tables ：

System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
using Excel = Microsoft.Office.Interop.Excel;
using System.IO;
using System.Diagnostics;
using System.Configuration;
using System.Data.SqlClient;
using System.Data;

namespace ConsoleApplication1 {

    class Program {

        static void Main(string[] args) {

            Excel.Application objApp;
            Excel.Workbook objBook;
            Excel.Sheets objSheets;
            Excel.Workbooks objBooks;

            string command = (@"SELECT * FROM dbo.Client");

            using (SqlConnection connection = new SqlConnection(GetConnectionStringByName("CubsPlus"))) {

                DataTable data = new DataTable();

                try {
                    connection.Open();
                }
                catch (Exception e) {
                    StackTrace st = new StackTrace(new StackFrame(true));
                    StackFrame sf = st.GetFrame(0);
                    Console.WriteLine (e.Message + "\n" + "Method" + sf.GetMethod().ToString() + "\n" + "Line" + sf.GetFileLineNumber().ToString());
                }
                try {
                    data = DataTools.SQLQueries.getDataTableFromQuery(connection, command);

                    if (data == null) {
                        throw new ArgumentNullException();
                    }
                }
                catch (Exception e) {

                    StackTrace st = new StackTrace(new StackFrame(true));
                    StackFrame sf = st.GetFrame(0);
                    Console.WriteLine (e.Message + "\n" + "Method" + sf.GetMethod().ToString() + "\n" + "Line" + sf.GetFileLineNumber().ToString());
                }

                objApp = new Excel.Application();

                try {     
                    objBooks = objApp.Workbooks;
                    objBook = objApp.Workbooks.Add(Missing.Value);
                    objSheets = objBook.Worksheets;

                    Excel.Worksheet sheet1 = (Excel.Worksheet)objSheets[1];
                    sheet1.Name = "ACCOUNTS";
                    string message = DataTools.Excel.copyDataTableToExcelSheet(data, sheet1);

                    if (message != null) {
                        Console.WriteLine("Problem importing the data to Excel");
                        Console.WriteLine(message);
                        Console.ReadLine();
                    }

                    //CREATE A PIVOT CACHE BASED ON THE EXPORTED DATA
                    Excel.PivotCache pivotCache = objBook.PivotCaches().Add(Excel.XlPivotTableSourceType.xlDatabase,sheet1.UsedRange);

                    Console.WriteLine(pivotCache.SourceData.ToString());

                    Console.ReadLine();

                    //WORKSHEET FOR NEW PIVOT TABLE
                    Excel.Worksheet sheet2 = (Excel.Worksheet)objSheets[2];
                    sheet2.Name = "PIVOT1";

                    //PIVOT TABLE BASED ON THE PIVOT CACHE OF EXPORTED DATA
                    Excel.PivotTables pivotTables = (Excel.PivotTables)sheet2.PivotTables(Missing.Value);
                    Excel.PivotTable pivotTable = pivotTables.Add(pivotCache, objApp.ActiveCell, "PivotTable1", Missing.Value, Missing.Value);

                    pivotTable.SmallGrid = false;
                    pivotTable.TableStyle = "PivotStyleLight1";

                    //ADDING PAGE FIELD
                    Excel.PivotField pageField = (Excel.PivotField)pivotTable.PivotFields("ParentName");
                    pageField.Orientation = Excel.XlPivotFieldOrientation.xlPageField;

                    //ADDING ROW FIELD
                    Excel.PivotField rowField = (Excel.PivotField)pivotTable.PivotFields("State");
                    rowField.Orientation = Excel.XlPivotFieldOrientation.xlRowField;

                    //ADDING DATA FIELD
                    pivotTable.AddDataField(pivotTable.PivotFields("SetupDate"), "average setup date", Excel.XlConsolidationFunction.xlAverage);

                    ExcelSaveAs(objApp, objBook, @"J:\WBK");

                    objApp.Quit();
                }     
                catch (Exception e) {

                    objApp.Quit();
                    Console.WriteLine(e.Message);
                    Console.ReadLine();
                }
            }
        }

        static string ExcelSaveAs(Excel.Application objApp, Excel.Workbook objBook, string path) {
            try {
                objApp.DisplayAlerts = false;
                objBook.SaveAs(path, Excel.XlFileFormat.xlExcel7, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Excel.XlSaveAsAccessMode.xlNoChange, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value);
                objApp.DisplayAlerts = true;
                return null;
            }
            catch (Exception e) {
                StackTrace st = new StackTrace(new StackFrame(true));
                StackFrame sf = st.GetFrame(0);
                return (e.Message + "\n" + "Method" + sf.GetMethod().ToString() + "\n" + "Line" + sf.GetFileLineNumber().ToString());
            }
        }
        static string GetConnectionStringByName(string name) {
            //ASSUME FAILURE
            string returnValue = null;

            //Look for the name in the connectionStrings section
            ConnectionStringSettings settings = ConfigurationManager.ConnectionStrings[name];

            // If found, return the connection string
            if (settings != null) {
                returnValue = settings.ConnectionString;
            }
            return returnValue;
        }
    }
}

這是用VB編寫的代碼，允許我們為選定的Pivot Table創建一個新的Pivot Cache Pivot Table ：

Sub SelPTNewCache()

    Dim wsTemp As Worksheet
    Dim pt As PivotTable

    On Error Resume Next
    Set pt = ActiveCell.PivotTable

    If pt Is Nothing Then
        MsgBox "Active cell is not in a pivot table"
    Else
        Set wsTemp = Worksheets.Add

        ActiveWorkbook.PivotCaches.Create( _
            SourceType:=xlDatabase, _
            SourceData:=pt.SourceData).CreatePivotTable _
            TableDestination:=wsTemp.Range("A3"), _
            TableName:="PivotTableTemp"

        pt.CacheIndex = wsTemp.PivotTables(1).CacheIndex

        Application.DisplayAlerts = False
        wsTemp.Delete
        Application.DisplayAlerts = True
    End If

exitHandler:
        Set pt = Nothing

End Sub

1.在asd.js文件中有以下元素：

- s代表字符串值

- n代表數值

- d代表日期值

- x代表索引值

- v表示值本身

那么，讓我們用人類語言翻譯這個表的F2單元格中包含的數據 ：

<x v="0"/>

值0是字符串數組中的zero index ，其中存儲了美國州的縮寫。 此數組中的第一個索引為我們檢索Arizona 。 我不知道為什么下一行中的單元格包含小寫的az ，而其他所有單元格都包含大寫的AZ但我確定它不是關於Shared Record 。

2.我沒有找到有關Excel在其內存中用於其pivotCache的內部C / C ++ Struct的任何有用信息。

最后：

3.這里有一個LINK載有關於在第三個加問題“幫助信息”有用的信息。

PS

關於Big O表示法 。

計算機科學中使用Big O表示法來描述算法的性能或復雜性。 Big O專門描述了最壞情況，可用於描述算法所需的執行時間或使用的空間（在內存或磁盤上）。 Big O notation可以根據輸入的大小來衡量程序的復雜程度。

O(1)代表總是在同一時間執行的算法，而不管輸入數據集的大小。
O(N)代表算法，其性能線性增長並且與輸入數據集的大小成正比。
O(N*N)代表算法，其性能與輸入數據集的大小平方成正比。
T(N) = O(log N)代表其性能取決於對數時間的算法。 采用對數時間的算法常見於二叉樹上的操作或使用二進制搜索時。

但是好的排序算法是嚴格的O(N log N) 。 具有這種效率的算法示例可以是合並排序 ，它將數組分成兩半，通過遞歸調用它們對這兩半進行排序，然后將結果合並回單個數組 。

這是一個抽象的C＃代碼片段，展示了O(N log N)算法的工作原理（大致相同的方法可用於創建數據透視表）：

public static int[] MergeSort(int[] inputItems, int lowerBound, int upperBound) {
    if (lowerBound < upperBound) {
        int middle = (lowerBound + upperBound) / 2;
        MergeSort(inputItems, lowerBound, middle);
        MergeSort(inputItems, middle + 1, upperBound);

        int[] leftArray = new int[middle - lowerBound + 1];
        int[] rightArray = new int[upperBound - middle];

        Array.Copy(inputItems, lowerBound, leftArray, 0, middle - lowerBound + 1);
        Array.Copy(inputItems, middle + 1, rightArray, 0, upperBound - middle);

        int i = 0;
        int j = 0;
        for (int count = lowerBound; count < upperBound + 1; count++) {
            if (i == leftArray.Length) {
                inputItems[count] = rightArray[j];
                j++;
            }
            else if (j == rightArray.Length) {
                inputItems[count] = leftArray[i];
                i++;
            }
            else if (leftArray[i] <= rightArray[j]) {
                inputItems[count] = leftArray[i];
                i++;
            }
            else {
                inputItems[count] = rightArray[j];
                j++;
            }
        }
    }
    return inputItems;
}

Answer 2

數據透視表與流行的信念不同，不僅僅是Excel功能，而是存在於處理表格結構數值數據的許多應用程序中 - 數據透視表是根據類別的數據聚合的一般概念的視覺和交互結果。
數據透視表始終鏈接到它們的數據。
創建數據透視表時，Excel將在后台構建一個包含數據的特殊內存緩存。 此數據透視表緩存存儲源數據范圍內的數據副本。
如果數據透視表引用相同的源數據范圍，則它們共享數據透視表緩存。 這有助於減小文件大小，並防止我們必須刷新共享相同源數據范圍的每個數據透視表。

數據透視表和數據透視緩存之間的關系可能會變得復雜。 特別是因為數據透視緩存存儲在后台，並且無法查看哪些數據透視表在工作簿中共享數據透視緩存。

電子表格文件的剖析

PivotCache類 PivotCache。當對象被序列化為xml時，其限定名稱為x：pivotCache。
PivotCache成員（Excel）表示數據透視表報表的內存高速緩存。
表示Office Open XML文檔中的所有元素派生自的基類。
OpenXML規范是一個龐大而復雜的野獸。
cacheField（PivotCache Field）表示PivotCache中的單個字段。 此定義包含有關字段的信息，例如其源，數據類型以及級別或層次結構中的位置。 sharedItems元素在此字段中存儲有關數據的其他信息。 如果沒有共享項，則值直接存儲在pivotCacheRecords部分中。
定義SharedItems類。 當對象被序列化為xml時，其限定名稱為x：sharedItems。
如何在C ++中創建數據透視表
如何從C＃代碼更新數據透視表數據
使用C ++在Excel數據透視表中顯示內存使用情況
如何從C ++創建excel數據透視表（沒有mfc的ole / com）
如何在C＃.NET代碼中創建Excel中的數據透視表
如何將數據導出到一個工作表並根據數據在另一個工作表中創建數據透視表
如何以編程方式將數據透視表和切片器添加到MS Excel

數據透視表聚合的Excel性能

問題描述

2 個解決方案

解決方案1
8 已采納 2019-08-01 15:15:38

解決方案2
4 2019-08-06 09:08:28

數據透視表聚合的Excel性能

問題描述

2 個解決方案

解決方案1 8 已采納 2019-08-01 15:15:38

解決方案2 4 2019-08-06 09:08:28

解決方案1
8 已采納 2019-08-01 15:15:38

解決方案2
4 2019-08-06 09:08:28