集合操作，需要帮助来优化报告生成器中的代码

Question

I'm creating a report generating tool that use custom data type of different sources from our system. 我正在创建一个报告生成工具，该工具使用来自我们系统的不同来源的自定义数据类型。 The user can create a report schema and depending on what asked, the data get associated based different index keys, time, time ranges, etc. The project is NOT doing queries in a relational database, it's pure C# code in collections from RAM. 用户可以创建一个报告模式，并根据要求创建不同的索引键，时间，时间范围等相关的数据。该项目不在关系数据库中进行查询，它是RAM集合中的纯C＃代码。

I'm having a huge performance issue and I'm looking at my code since a few days and struggle with trying to optimize it. 我遇到了一个巨大的性能问题，几天以来一直在查看我的代码，并且一直在努力对其进行优化。

I stripped down the code to the minimum for a short example of what the profiler point as the problematic algorithm, but the real version is a bit more complex with more conditions and working with dates. 我将代码缩减为最少，以作为探查器指出问题算法的简短示例，但实际版本在条件更多且使用日期的情况下会更加复杂。

In short, this function return a subset of "values" satisfying the conditions depending on the keys of the values that were selected from the "index rows". 简而言之，此函数根据从“索引行”中选择的值的键返回满足条件的“值”子集。

private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
    var checkContainers = ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
                           values.Any(t => t.ContainerId.HasValue));

    var checkEnterpriseId = ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
                             values.Any(t => t.EnterpriseId.HasValue));

    var ret = new List<LoadedDataSource>();
    foreach (var value in values)
    {
        var valid = true;

        foreach (var index in indexRows)
        {
            // ContainerId
            var indexConservedSource = index.AsEnumerable();
            if (checkContainers && index.CheckContainer && value.ContainerId.HasValue)
            {
                indexConservedSource = indexConservedSource.Where(t => t.ContainerId.HasValue && t.ContainerId.Value == value.ContainerId.Value);
                if (!indexConservedSource.Any())
                {
                    valid = false;
                    break;
                }
            }

            //EnterpriseId
            if (checkEnterpriseId && index.CheckEnterpriseId && value.EnterpriseId.HasValue)
            {
                indexConservedSource = indexConservedSource.Where(t => t.EnterpriseId.HasValue && t.EnterpriseId.Value == value.EnterpriseId.Value);
                if (!indexConservedSource.Any())
                {
                    valid = false;
                    break;
                }
            }
        }

        if (valid)
            ret.Add(value);
    }

    return ret;
}

This works for small samples, but as soon as I have thousands of values, and 2-3 index rows with a few dozens values too, it can take hours to generate. 这适用于小样本，但是一旦我有成千上万个值，并且还有具有几十个值的2-3个索引行，则可能需要数小时才能生成。

As you can see, I try to break as soon as a index condition fail and pass to the next value. 如您所见，我尝试在索引条件失败后立即中断并传递给下一个值。

I could probably do everything in a single "values.Where(####).ToList()", but that condition get complex fast. 我可能可以在一个单一的“ values.Where（####）。ToList（）”中完成所有操作，但是这种情况会很快变得复杂。

I tried generating a IQueryable around indexConservedSource but it was even worse. 我尝试围绕indexConservedSource生成一个IQueryable，但情况更糟。 I tried using a Parallel.ForEach with a ConcurrentBag for "ret", and it was also slower. 我尝试将带有ConcurrentBag的Parallel.ForEach用于“ ret”，但它也比较慢。

What else can be done? 还有什么可以做的？

Answer 1

What you are doing, in principle, is calculating intersection of two sequences. 原则上，您正在执行的是计算两个序列的交集。 You use two nested loops and that is slow as the time is O(m*n). 您使用了两个嵌套循环，这很慢，因为时间是O（m * n）。 You have two other options: 您还有两个选择：

sort both sequences and merge them 排序两个序列并合并它们
convert one sequence into hash table and test the second against it 将一个序列转换为哈希表，然后对其进行测试

The second approach seems better for this scenario. 对于这种情况，第二种方法似乎更好。 Just convert those index lists into HashSet and test values against it. 只需将这些索引列表转换为HashSet并对其进行测试即可。 I added some code for inspiration: 我添加了一些启发代码：

private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
    var ret = values;

    if ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
        ret.Any(t => t.ContainerId.HasValue))
    {
        var indexes = indexRows
            .Where(i => i.CheckContainer)
            .Select(i => new HashSet<int>(i
                .Where(h => h.ContainerId.HasValue)
                .Select(h => h.ContainerId.Value)))
            .ToList();

        ret = ret.Where(v => v.ContainerId == null 
                        || indexes.All(i => i.Contains(v.ContainerId)))
                 .ToList();
    }

    if ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
        ret.Any(t => t.EnterpriseId.HasValue))
    {
        var indexes = indexRows
            .Where(i => i.CheckEnterpriseId)
            .Select(i => new HashSet<int>(i
                .Where(h => h.EnterpriseId.HasValue)
                .Select(h => h.EnterpriseId.Value)))
            .ToList();

        ret = ret.Where(v => v.EnterpriseId == null 
                        || indexes.All(i => i.Contains(v.EnterpriseId)))
                 .ToList();
    }

    return ret;
}

集合操作，需要帮助来优化报告生成器中的代码

问题描述

1 个解决方案

解决方案1
1 2017-06-21 22:10:02

集合操作，需要帮​​助来优化报告生成器中的代码

问题描述

1 个解决方案

解决方案1 1 2017-06-21 22:10:02

集合操作，需要帮助来优化报告生成器中的代码

解决方案1
1 2017-06-21 22:10:02