简体   繁体   中英

How to optimize this query using EF

Hi good day I am new with Entity Framework. I just wanna to know if there is a way I could improve my implementation. Here are the codes.

 public async Task<List<Record>> GetRecordsByBatchId(string batchId, string source)
    {
        List<string> idList = new List<string>();


        //[1] Get all parent ID from table 1 with a filter of source and batchId
        var parentIds= await _context.Set<FirstTable>()
            .Where(a => a.IsActive
                && a.BatchId.Equals(batchId)
                && a.Source.Equals(source)).Select(b => b.ParentId).ToListAsync();

        if (parentIds.Count() == 0)
        {
            return new List<Record>();
        }


        //[2] Query idNumber of each parentId from [1] to SecondTable
        List<long> idNumber = await _context.Set<SecondTable>()
            .Where(a => parentIds.Contains(a.Id))
            .Select(b => b.IdNumber).ToListAsync();


        //[3] Query Record/s that contains idNumber from previous query [2]. it is possible that 1 or 
        //more records has same idNumber
        List<Risk> recordByIdNumber = await _context.Set<SecondTable>()
            .Where(a => idNumber.Contains(a.IdNumber)).ToListAsync();


       //[4] In this part I just want to group the records in [3] by Id number and sort each group 
       //by its endorsementNumber in descending order and return the record with highest endorsement 
       //number for each group 
        return (from record in recordByIdNumber 
                group record by record.IdNumber into g
                orderby g.Key
                select g.OrderByDescending(risk =>risk.EndorsementNumber).FirstOrDefault()).ToList();
    }
}

The model for the FirstTable

    public class FirstTable
{
    public Guid? ParentId{ get; set; }
    public string BatchId { get; set; }
    public string Source { get; set; }
    public bool IsActive { get; set; }
}

The model for the SecondTable

 public class SecondTable
{
    public Guid Id{ get; set; }
    public int EndorsementNumber { get; set; }
    public long IdNumber { get; set; }
}

Note: I just include the necessary properties in the model.

This approach is working as expected. I just wanna know if there is a possibility that these queries could be optimized that there is only 1 query for the SecondTable table.

Any help would be greatly appreciated, thanks in advance.

var parentIds =  _context.Set<FirstTable>()
        .Where(a => a.IsActive
            && a.BatchId.Equals(batchId)
            && a.Source.Equals(source)).Select(b => new { b.parentId });


var risks = await (from s in  _context.Set<SecondTable>()
             join p in parentIds on s.Id equals p.parentId
             join r in _context.Set<SecondTable>() on s.IdNumber equals r.IdNumber
             select r).GroupBy(r=>r.IdNumber)
                       .Select(r=> r.OrderByDescending(risk =>risk.EndorsementNumber).FirstOrDefault())
            .ToArrayAsync();
   return risks;

You can have 1 query instead of 3. It will perform better as the number of the rows from the first query grows.

EDIT: As @SvyatoslavDanyliv mentioned in the comments, group-take operations may not work depending on the version of the EF and the provider you use. You may need to separate the query and the group by operation like below:

var result = await (from s in  _context.Set<SecondTable>()
                 join p in parentIds on s.Id equals p.parentId
                 join r in _context.Set<SecondTable>() on s.IdNumber equals r.IdNumber
                 select r).ToArrayAsync();

var risks = result.GroupBy(r=>r.IdNumber)
                  .Select(r=> r.OrderByDescending(
                           risk =>risk.EndorsementNumber).FirstOrDefault())
            .ToArray();
                
return risks;

Yes, queries 1-3 can and should be combined. In order to do that you need, to have navigation properties in your model. It seems that there is one-to-many relationship between FirstTable and SecondTable. Let's use Customer and Order instead.

class Customer {
    int CustomerId
    string BatchId
    ICollection<Order> Orders
}

class Order {
    int OrderId
    int CustomerId
    Customer Customer
    Risk Risk
}

in which case you just write third query as

List<Risk> = await _context.Orders.Where(o => o.Customer.BatchId == batchId)
    .Select(o => o.Risk).ToListAsync();

Obviously, I am only guessing the structure and the relationship. But hopefully, this can get you started. For me Contains() is "code smell". There is a high chance that there will be large list out of your first query, and contains() will produce a huge IN clause in the database, that can easily crash the system

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM