简体   繁体   中英

How to divide a LINQ query in small group to avoid Timeout expired exception?

I have LINQ query as following

HashSet<Guid> temp1 ; // Getting this value through another method

var Ids = temp1.Except((from temp2 in prodContext.Documents 
                        where temp1.Contains(temp2.id) 
                        select temp2.id)).ToList();   

Here, temp1 has around 40k values. I'm getting timeout error sometimes, How can I divide this query using while or any other loop so that it won't give me timeout error. I tried to set Connect Timeout in connection string and for database context but nothing works.

Any suggestion please

This is one of those unusual operations that is likely to be a query that can more effectively be performed in memory by the application rather than by the database. Instead of trying to send all of the id values you have in your set to the DB, have it find all of the items with those IDs, and then send them all back to you, it's very plausibly better to just get all of the document ids and filter them on the application side of things.

var documentIds = prodContext.Documents.Select(doc => doc.id);
var Ids = temp1.Except(documentIds).ToList();   

Now, depending on how many documents you have, even that could theoretically time out. If it would, then you'll need to paginate the fetching of all of the document IDs. You can use the following method to paginate any query to avoid fetching the entire result set all at once:

public static IEnumerable<IEnumerable<T>> Paginate<T>(
    this IQueryable<T> query,
    int pageSize)
{
    int page = 0;
    while (true)
    {
        var nextPage = query.Skip(page * pageSize)
            .Take(pageSize)
            .ToList();
        if (nextPage.Any())
            yield return nextPage;
        else
            yield break;
        page++;
    }
}

This allows you to write:

var documentIds = prodContext.Documents.Select(doc => doc.id)
    //play around with different batch sizes to see what works best
    .Paginate(someBatchSize)
    .SelectMany(x => x);
temp1.ExceptWith(documentIds);

Here's a way to do it where it combines both pagination and caching.

This way it only caches the one page size at a time to prevent memory overloading and prevent the time out. I hope this works.

int pageSize = 1000;

HashSet<Guid> temp1;

List<Guid> idsFromTable = new List<Guid>();

var Ids = temp1.ToList();
for(int i = 0; true; i++)
{
    //Cache the table locally to prevent logic running while selecting on page size
    idsFromTable.AddRange(prodContext.Documents.Skip(pageSize * i).Take(pageSize).Select(x=> x.id));

    if(idsFromTable.Any())
    {
        //Then use the cached list instead of the datacontext
        Ids = Ids.Except(idsFromTable).ToList();
        idsFromTable.Clear(); 
    }
    else
        break;
}

Similarly to Servy's answer you might want to try paginating the queries as opposed to just pulling them all in. The efficiency of this depends on the DB you are using, I certainly got some mileage out of it on Informix. In this case the logic would look like this

HashSet<Guid> ids... //Got from another method
List<Guid> validIds = new List<Guid>();

const Int32 BUFFERSIZE = 1000;
var validationBuffer = new List<Guid>(BUFFERSIZE);

foreach(var g in ids)
{
    validationBuffer.Add(g)
    if(validationBuffer.Count == BUFFERSIZE)
    {
         validIds.AddRange(
             prodContext.Documents
                 .Select(t => t.id)
                 .Where(g => validationBuffer.Contains(g)));
         validationBuffer.Clear();
    }
}

//Do last query
validIds.AddRange(
    prodContext.Documents
        .Select(t => t.id)
        .Where(g => validationBuffer.Contains(g)));

var missingIds = ids.Except(validIds);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM