简体   繁体   中英

Optimization Algorithm in C#

I have an optimization issue that I'm not sure where to go from here. I have a program that tries to find the best combination of inputs that return the highest predicted r squared value. The problem is that I have 21 total inputs (List) and I need them in a set of 15 inputs. The formula for total combinations is:

n! / r!(n - r)! = 21! / 15!(21 - 15)! = 54,264 possible combinations

So obviously running through each combination and calculating the predicted rsquared is not an ideal solution so is there an better way/algorithm/method I can use to try to skip or narrow down the bad combinations so that I'm only processing the fewest amount of combinations? Here is my current psuedo code for this issue:

public BestCombo GetBestCombo(List<List<MultipleRegressionInfo>> combosList)
{
   BestCombo bestCombo = new BestCombo();

   foreach (var combo in combosList)
   {
      var predRsquared = CalculatePredictedRSquared(combo);

      if (predRsquared > bestCombo.predRSquared)
      {
         bestCombo.predRSquared = predRsquared;
         bestCombo.BestRSquaredCombo = combo;
      }
   }

   return bestCombo;
}

public class BestCombo
    {
        public double predRSquared { get; set; }
        public IEnumerable<MultipleRegressionInfo> BestRSquaredCombo { get; set; }
    }

public class MultipleRegressionInfo
{
    public List<double> input { get; set; }
    public List<double> output { get; set; }
}

public double CalculatePredictedRSquared(List<MultipleRegressionInfo> combo)
{
    Matrix<double> matrix = BuildMatrix(combo.Select(i => i.input).ToArray());
    Vector<double> vector = BuildVector(combo.ElementAt(0).output);
    var coefficients = CalculateWithQR(matrix, vector);
    var y = CalculateYIntercept(coefficients, input, output);
    var estimateList = CalculateEstimates(coefficients, y, input, output);
    return GetPredRsquared(estimateList, output);
}

54,264 is not enormous for a computer - it might be worth timing a few calls to compute R^2 and multiplying up to see just how long this would take.

There is a branch and bound algorithm for this sort of problem, which relies on the fact that R^2(A,B,C) >= R^2(A,B) - that the R^2 can only decrease when you drop a variable. Recursively search the space of all sets of variables of size at least 15. After computing the R^2 for a set of variables, make recursive calls with sets produced by dropping a single variable from the set, where any such drop must be to the right of any existing gap (so A.CDE produces A..DE, ACE, and A.CD. but not ..CDE, which will be produced by .BCDE). You can terminate the recursion when you get down to the desired size of set, or when you find an R^2 that is no better than the best answer so far.

If it happens that you often find R^2 values no better than the best answer so far, this will save time - but this is not guaranteed. You can attempt to improve the efficiency by chosing to investigate the sets with highest R^2 first, hoping that you find a new best answer good enough to rule out their siblings by the time you come to them, and by using a procedure to calculate R^2 for A.CDE that makes use of the calculations you have already done for ABCDE.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM