The below code groups the result (a List of ClassTypeObject with 500,000 items) into List< a
> type.
The GroupBy takes around 40 to 50 sec when executed. Is there any way to optimize this?
var groupByTest = result.
GroupBy(g => new
{
First = g.Field1
}).
Select(gp => new
{
gp.Key.Field1,
InnerList = result.Where(x => x.Field1 == gp.Key.Field1).ToList()
}).ToList();
You are selecting InnerList
from non-grouped collection ie result
that's why your query is taking time. You can change the inner query assignment as
InnerList = gp.ToList()
as gp
is already grouped based on Field1
.
Full Code
var groupByTest = result.
GroupBy(g => new
{
First = g.Field1
}).
Select(gp => new
{
gp.Key.Field1,
InnerList = gp.ToList()
}).ToList();
The way this query is written InnerList
ends up containing just the items in the group. In its current form, the original source is scanned once for each group key. The equivalent:
var groupByTest = result.GroupBy(g => g.Field1)
.Select(gp => new {
Field1=gp.Key,
InnerList = gp.ToList()})
.ToList();
Would scan the source only once.
Once this is fixed, the query can be parallelized easily with AsParallel()
var groupByTest = result.AsParallel()
.GroupBy(g => g.Field1)
.Select(gp => new {
Field1=gp.Key,
InnerList = gp.ToList()})
.ToList();
This will use all cores in the machine to partition the data, group them and construct the final list
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.