简体   繁体   中英

C# - Custom GroupBy taking more time for large dataset

The below code groups the result (a List of ClassTypeObject with 500,000 items) into List< a > type.

The GroupBy takes around 40 to 50 sec when executed. Is there any way to optimize this?

var groupByTest = result.
                GroupBy(g => new
                {
                    First = g.Field1
                }).
                Select(gp => new
                {
                    gp.Key.Field1,
                    InnerList = result.Where(x => x.Field1 == gp.Key.Field1).ToList()
                }).ToList();

You are selecting InnerList from non-grouped collection ie result that's why your query is taking time. You can change the inner query assignment as

InnerList = gp.ToList()

as gp is already grouped based on Field1 .

Full Code

   var groupByTest = result.
            GroupBy(g => new
            {
                First = g.Field1
            }).
            Select(gp => new
            {
                gp.Key.Field1,
                InnerList = gp.ToList()
            }).ToList();

The way this query is written InnerList ends up containing just the items in the group. In its current form, the original source is scanned once for each group key. The equivalent:

var groupByTest = result.GroupBy(g => g.Field1)
                         .Select(gp => new {
                                        Field1=gp.Key,
                                        InnerList = gp.ToList()})
                          .ToList();

Would scan the source only once.

Once this is fixed, the query can be parallelized easily with AsParallel()

var groupByTest = result.AsParallel()
                        .GroupBy(g => g.Field1)
                        .Select(gp => new {
                                        Field1=gp.Key,
                                        InnerList = gp.ToList()})
                         .ToList();

This will use all cores in the machine to partition the data, group them and construct the final list

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM