简体   繁体   中英

Combinations, Power Sets No Idea where to even start

I have this problem I was hoping people could point me in the right direction of figuring out because I dont even know where to start.

Here's the setup, I have two tables in SQL Server, Table A is a summary table, Table B is a details table, so something like this:

Table A
ParentID        Total Amount
1               100
2               587


Table B
ParentID        ChildID         Amount
1               1               8
1               2               7
1               3               18
1               4               93
2               5               500
2               6               82
2               7               5
2               8               10

So for each ParentID, I need to come up with the combination of children whose Sums of their Amount equals the Total Amount of the Parent.

So for ParentID 1 (100) it would be ChildIDs 2 and 4 (7 + 93) and I would just ignore ChildIDs 1 and 3.

For ParentID 2 it would be the children 5, 6, 7 and I would ignore 8.

There is no fixed size to the children combinations that can be combined to equal the Parent.

So doing some research, it appears I need to get the Power Set of all the children for each Parent. Then from there I can sum up their total amounts and see if any of them equal the Parent. However, correct me if I'm wrong but if there are N items in the set, then the Power Set would consist of 2^N number of combinations.

Some of these parents have over 750 children and 2^750 is a very very very large number. I'm mostly a .NET/SQL Server guy but am open to trying any technologies that people would think are right for the job.

So a few questions.

1) Should I go down the path of trying to figure out the Power Set for each parent or am I barking up the wrong tree with that?
2) Is this an alogrithm that has already been figured out and I'm just doing a poor job finding it on Google? 3) Assuming this can be done, what would be the right approach to solving it?

The problem is reducable to subset problem which can be reduced to simple knapsack problem. There is a dynamic programming solution to the problem :-

W = knapsack capacity = Total Amount of parent.

item weight = item cost = child amount.

maximize profit and if W = profit then there exists a subset else not.

Use DP solution of kanpsack to solve this problem and get result by backtracking.

Here is a solution in JAVA maybe you can convert to C# :-

public class SubSetSum {
    static int[][] costs;

    public static void calSets(int target,int[] arr) {

        costs = new int[arr.length][target+1];
        for(int j=0;j<=target;j++) {
            if(arr[0]<=j) {

                costs[0][j] = arr[0]; 
            }
        }
        for(int i=1;i<arr.length;i++) {

            for(int j=0;j<=target;j++) {
                costs[i][j] = costs[i-1][j];
                if(arr[i]<=j) {
                    costs[i][j] = Math.max(costs[i][j],costs[i-1][j-arr[i]]+arr[i]);
                }
            }

        }

        System.out.println("total amount: "+costs[arr.length-1][target]);
       if(costs[arr.length-1][target]==target) {
           System.out.println("Sets :");
           printSets(arr,arr.length-1,target,"");
       } 

       else System.out.println("No such Set found");

    } 

    public static void printSets(int[] arr,int n,int w,String result) {


        if(w==0) {
            System.out.println(result);
            return;
        }

        if(n==0) {
           System.out.println(result+","+0);
            return; 
        }

        if(costs[n-1][w]==costs[n][w]) {
            printSets(arr,n-1,w,new String(result));
        }
        if(arr[n]<=w&&(costs[n-1][w-arr[n]]+arr[n])==costs[n][w]) {
            printSets(arr,n-1,w-arr[n],result+","+n);
        }
    }

    public static void main(String[] args) {
        int[] arr = {1,2,3,8,9,7};
        calSets(10,arr);
    }
}

Note :-

In some cases brute force is more feasible than DP as space and time complexity for DP = O(ParentAmount*totalchildren) and whereas time complexity for brute force = O(2^n) and space complexity = O(1) . You may choose according to the problem.

A bit of research tells me you can solve this in N*2^P where N is the number of children and P is the number of bits needed to store the largest number. Look, say, here: http://en.wikipedia.org/wiki/Subset_sum_problem#Polynomial_time_approximate_algorithm

=============================================

As long as the number of children per parent is small, then working out the powerset is fine, but note that the powerset of N children is 2^n, which grows very fast. 2^750 is hopelessly large, about 10^225.

There are plenty of functions for finding the powerset, I mostly work in Java and I know there is one in Guava, and I think there is also one in Apache Commoms Math. Constructing a powerset is not difficult, intuitively you can think of it is a binary tree of depth N, where every level is "Do I include this element Yes/No".

I don't write in c# but in pseudo code

Set<Set<Object>> Powerset(Set<Set<Object>> powerset, Object newItem){
    Set<Set<Object>> newSet = powerset.clone();
    for (Set<Object> set : newSet){
        set.add(newItem)
    }
    return newSet.addAll(powerset)
}

So this takes a powerset of N elements, and returns the power set of N+1 elements. So you can just call it repeatedly to build the powerset starting with the empty set.

For larger numbers of children, rather than build the powerset, use teh same function, but remove any set whose sum exceed the target. (As clearly its monotonically increasing, so once the target is exceeded it cannot be right to continue. Eg Assume Object.hasValue() returns a number then do:

Set<Set<Object>> Powerset(Set<Set<Object>> powerset, Object newItem, int target){
    Set<Set<Object>> newSet = powerset.clone();
    for (Set<Object> set : newSet){
        set.add(newItem)
    }


    Set<Set<Object>> result = new Set<Set<Object>>();
    for(Set<Object> set : newSet){
        int sum = 0;
        for(Object o : set){
            sum += o.hasvalue();
        }
        if(sum <= target){
            result.add(set)
        }
    }
    return result.addAll(powerset);
}

Various optimisations are possible (eg you should add the largest numbers first, as its quickest if you exclude numbers as early as possible. (if have a number greater than the target you will only have to add it to one set if you do if first, but to 2^n-1 sets if you do it last). You can also make it so the Set carries the sum of its components directly, so elminating the sum loop, and you can improve the space complexity by storing it as a tree with elements pointing to their parent element, and doing a DFS so you only have one branch of the tree in memory at a time, and only keeping successful branches.

If some parents have 750 children, you are going to run out of time. You should investigate some sort of parallel cloud computing solution if you want to get an answer before the sun burns out.

2^750 = 5922386521
        532855740161817506647119
        732883018558947359509044
        845726112560091729648156
        474603305162988578607512
        400425457279991804428268
        870599332596921062626576
        000993556884845161077691
        136496092218188572933193
        945756793025561702170624

Now, lets assume we are really lucky, about as lucky as winning the lottery and find the correct combination really early.

Now, lets assume that we have fast computer that can calculate a billion sums a second.

Its going to take somthing like

6.332987 * 10^135 years

to find the answer. Now that is still an unimaginably long period of time. You can think of it as,

4.52356 * 10^125 ages of the universe.

Or more expressively, longer that the age of the universe multiplied by the number of atoms in the universe. Thats a long time.

http://xkcd.com/287/

This is some conjecture, but I suspect there is not enough material in the universe to make enough computers to parallelize the calculation enough to complete it before the sun runs out of fuel. (Whithin the bounds of exisiting technology.)

I'd suggest a brute force power set approach should be abandoned. Computers are fast but they're not that fast.

Actually your situation isn't as dire as the 2^750 analysis suggests. Abandon the power set solution and go for dynamic programming instead. One option might look like:

public static IEnumerable<int> FindSet(IEnumerable<int> amounts, int target)
{
    var results = new Dictionary<int, List<int>>();
    results[0] = new List<int>();
    foreach(var amount in amounts)
    {
        for(int i = 0; i <= target; i++)
        {
            if(!results.ContainsKey(i) || results[i].Contains(amount))
                continue;
            var combination = new List<int>(results[i]);
            combination.Add(amount);
            if (i + amount == target)
                return combination;
            results[i + amount] = combination;
        }
    }
    return null;
}

The premise is that we start by saying we know how to reach a sum of 0, with the empty set. Then for each available amount we go through and say that if we know how to reach a sum of n without using amount , then we also now how to reach a sum of n+amount - by taking the previous result and adding amount to it.

As you can see from the loops, this runs in order O(NK), where N is the number of values in the set, and K is the target sum. Much better than O(2^N).

This is just to give a sketch of the algorithm. There's probably plenty of room for performance tweaks. Particularly the lists could be replaced with some kind of "Node" class supporting a tree structure.

For a sketch of what I mean by Node class, something like:

public class Node
{
    public int Value;
    public Node Parent;

    public List<int> ToList()
    {
        if(Parent == null)
        {
            return new List<int> { Value };
        }
        var result = Parent.ToList();
        result.Add(Value);
        return result;
    }
}

Then you don't have to keep copying around lists

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM