有沒有更快的方法來過濾數據表？

Question

我有一個包含250萬行的數據表。 我想過濾數據表中的某些行。

數據表的列：

[IntCode] long
[BDIntCode] long
[TxnDT] DateTime
[TxnQuantity] decimal
[RecordUser] long
[RecordDT] DateTime

我的代碼如下：

            foreach (var down in breakDowns)
            {
                sw.Start();
                var relatedBreakDowns = firstGroup.Where(x => x.RelatedBDIntCode == down.ProcessingRowIntCode).ToList();
                if (relatedBreakDowns.Count == 0) continue;

                var filters = string.Format("BDIntCode IN ({0})", string.Join(",", relatedBreakDowns.Select(x => x.BDIntCode)));
                var filteredDatatable = datatable.Select(filters, "BDIntCode");
                foreach (var dataRow in filteredDatatable)
                {
                    var r = dataTableSchema.NewRow();
                    r["RecordUser"] = recordUser;
                    r["RecordDT"] = DateTime.Now;
                    r["TxnQuantity"] = dataRow["TxnQuantity"];
                    r["TxnDT"] = dataRow["TxnDT"];
                    r["BDIntCode"] = down.ProcessingRowIntCode;
                    dataTableSchema.Rows.Add(r);
                }
                sw.Stop();
                count++;
                Console.WriteLine("Group: " + unrelatedBreakDownGroup.RelatedBDGroupIntCode + ", Count : " + count + ", ElapsedTime : ms = " + sw.ElapsedMilliseconds + ", sec = " + sw.ElapsedMilliseconds / 1000f );
                sw.Reset();
            }

breakDowns列表的計數為1805，firstGroup列表的計數為9880。

Answer 1

就個人而言，我將從將其放入List<SomeType>而不是數據表開始。 然后，我將對數據建立索引：在您的情況下，您正在按RelatedBDIntCode搜索並期望有多個匹配項，因此：

var index = firstGroup.ToLookup(x => x.RelatedBDIntCode);
foreach (var down in breakDowns) {
    var matches = index[down.ProcessingRowIntCode].ToList();
    //...
}

這避免了做一個完整的掃描firstGroup對每一個項目breakDowns 。

下一個IN可以移到類似索引的搜索中，這次大概是在BDIntCode 。

Answer 2

只是為了詳細說明Marc的答案-您應該嘗試減少代碼執行的迭代次數。

當前編寫代碼的方式是，對故障收集集合進行1805次迭代，然后對於每個迭代，對firstGroup集合進行9880次迭代，因此總共進行了17833400次迭代，而沒有考慮數據表過濾器。

因此，您的方法應該是嘗試對數據進行索引，以減少執行的迭代次數。

所以，第一步可能是創建的索引映射RelatedBDIntCode到正確行datatable成字典。 然后，您可以遍歷breakDowns並為每個down拉出映射行，如下所示：

var dtIndexed = 
    firstGroup
    .GroupBy(x => x.RelatedBDIntCode)
    .ToDictionary
    (
        x => x.Key, //the RelatedBDIntCode you'll be selecting with 
        x =>        //the mapped rows. This is the same method of filtering, but you could try others
        {
            var filters = string.Format("BDIntCode IN ({0})", string.Join(",", x.Select(y => y.BDIntCode)));
            return datatable.Select(filters, "BDIntCode");
        }    
    );

foreach (var down in breakDowns)
{
    if(!dtIndexed.ContainsKey(down.ProcessingRowIntCode)) continue;

    var rows = dtIndexed[down.ProcessingRowIntCode];

    foreach (var row in rows)
    {
        var r = dataTableSchema.NewRow();
        r["RecordUser"] = recordUser;
        r["RecordDT"] = DateTime.Now;
        r["TxnQuantity"] = row["TxnQuantity"];
        r["TxnDT"] = row["TxnDT"];
        r["BDIntCode"] = down.ProcessingRowIntCode;
        dataTableSchema.Rows.Add(r);
    }
}

這種方法應減少代碼正在執行的迭代次數，從而提高性能。

請注意，在上面的代碼中，我使用了與對數據表執行過濾的方法完全相同的方法，即datatable.Select(filter, order) 。 您也可以嘗試使用datatable.AsEnumerable().Where(row => ...)

有沒有更快的方法來過濾數據表？

問題描述

2 個解決方案

解決方案1
0 2014-06-20 09:12:21

解決方案2
0 2014-06-20 12:40:48

有沒有更快的方法來過濾數據表？

問題描述

2 個解決方案

解決方案1 0 2014-06-20 09:12:21

解決方案2 0 2014-06-20 12:40:48

解決方案1
0 2014-06-20 09:12:21

解決方案2
0 2014-06-20 12:40:48