简体   繁体   中英

How to find matching records in data table

I've got a data table with with two collumns: a number and a date, for example :

125 | 2013/10/20
100 | 2013/10/21
150 | 2013/10/24
225 | 2013/10/24
250 | 2013/10/28
310 | 2013/10/30

now, I want to search for all records ordered by date where the sum of the number matches 500. I can easily see that the first, third and fourth record (125 + 150 + 225 = 500) provide a match , but to program such, I can only think of going through the data table a zillion of times until I found the right match.

Has anybody a smarter idea?

In the worst case, you do have to go through all 2^n subsets of your dataset, but if all of your items are non-negative you can start by filtering on item.Number <= 500 .

Here is a possible Subsets method (actually an answer to How to get all subsets of an array? , but don't tell them):

public static IEnumerable<IEnumerable<T>> Subsets(this IEnumerable<T> source)
{
    var first = source.FirstOrDefault();
    if (first == null) return new[] { Enumerable.Empty<T>() };

    var others = source.Skip(1).Subsets();
    return others.Concat(others.Select(s => s.Concat(new { first })));
}

Once you have your Subsets method, then you can filter the result as follows, though performance is still of the order of a gazillion (or 2^n if you want to be picky).

var sets = items.Where(i => i.Number <= 500)
    .Subsets().Where(s => s.Sum(i => i.Number) == 500);

However, if you do have the constraint on Number , that it is non-negative, you can combine the Subsets operation with a search for a target sum. That would mean you would define

public static IEnumerable<IEnumerable<T>> SubsetsAddingUpTo(this IEnumerable<T> source, int target)
{
    // This stopping condition ensures that you will not have to walk the rest of the tree when you have already hit or exceeded your target.
    // It assumes that the Number values are all non-negative.
    if (target < 0) return Enumerable.Empty<IEnumerable<T>>();

    var first = source.FirstOrDefault();
    if (first == null) return Enumerable.Empty<IEnumerable<T>>();

    var tail = source.Skip(1).Where(i => i.Number <= target).ToList();

    var othersIncludingFirst = tail.SubsetsAddingUpTo(target - first.Number);
    var othersExcludingFirst = tail.SubsetsAddingUpTo(target);

    return othersExcludingFirst.Concat(othersIncludingFirst.Select(s => s.Concat(new { first })));
}

Because the check for <= target happens inside the method, you don't have to do any pre-filtering. However, you can perform a sort before you do the search, to give you your sets in a hierarchical date order. The call will be

var sets = items.OrderByDescending(i => i.Date).SubsetsAddingUpTo(500);

This should actually give you decent performance. The worst case (every item has a number of 0 or 1) won't be very good (order 2^n ), but if most of the values of Number are of a similar order of magnitude to your target sum, as is the case in your example, then the stopping condition will fly in to your rescue and save you a large number of unnecessary operations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM