简体   繁体   中英

How do I write optimization code in R when the initial values are not directly contained within the objective function?

I'm trying to write a piece of optimization code in R to calculate a set of unknown values for a biological question concerning drosophila flies.

The dataframe is composed of 13 columns (just showing 9 in code below for clarity) with varying numbers of rows. The first three columns contain collected data and the rest of the columns are calculated using various formulas. Two of the columns, Missing_C and Missing_D are initially populated with empty data and in the optimization problem represent the initial values.

  Time.min. Prob_C Prob_D Miss_C Miss_D Event_C Event_D Risk_C Risk_D
1         0   1.00   1.00         0         0    0.00    0.00  86.00  78.00
2         5   0.98   0.97         0         0    1.93    1.98  84.07  76.02
3        16   0.84   0.95         0         0   10.67    1.90  73.40  74.12
4        17   0.50   0.75         0         0   21.02   12.85  52.38  61.27
5        20   0.30   0.50         0         0   14.97   15.32  37.42  45.95

As an example of the some of the formulas used, Event_C and Risk_C are calculated with a for loop as follows:

#define values for events_c and risk_c with for loops`

temp_events_c <-vector()
temp_risk_c <-vector()


for (i in 2:no_rows) {
  temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
  events_c[i] <- temp_events_c
  temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
  risk_c[i] <- temp_risk_c
}

From this data, I also have a single, collected, value (9.1 in this instance) which relates to the values in the table. The following function defines the relationship to this value to columns Event_C, Event_D and two columns not shown in the above, Expected_C and Expected_D where the sums of those columns are represented by x[1], x[2], x[3], x[4]:

fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]

I then would like to use a minimization algorithm ie slsqp from nloptr to calculate the values in the Miss_C and Miss_D which ultimately satisfy this single value. The extra code for the optimization would be something like this:

x0 <- c(Miss_C,Miss_D)

heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 9.1  # heq == 0

slsqp(x0, fn, gr = NULL, 
      hin = NULL, heq = heq)

Obviously, that doesn't work as the initial values are not directly contained within the function that needs to be solved and that is the point I am stuck at! I'm not sure if this is an optimization problem as such or more of a general R coding question - either way, any help would be much appreciated.

Cheers, Andrew

* Edit - complete code added as per request *

#input variables

time_vector <- c(0,5,16,17,20)

prob_c <- c(1,0.977,0.835,0.5,0.30)

prob_d <- c(1,0.974,0.949,0.75,0.50)

miss_c <- c(0,0,0,0,0)

miss_d <- c(0,0,0,0,0)

#get number of rows

no_rows <- length(time_vector)

#fill events columns with dummy data

events_c <- c(0:(no_rows - 1))
events_d <- c(0:(no_rows - 1))

#define starting number at risk

risk_c_t0 <- 86
risk_d_t0 <- 78


#add t0 risk to each column

risk_c <- risk_c_t0
risk_d <-risk_d_t0

#fill risk columns with dummy data

risk_c[2:no_rows] <- c(2:no_rows)
risk_d[2:no_rows] <- c(2:no_rows)


#re-define values for events_c and risk_c with for loops

temp_events_c <-vector()
temp_risk_c <-vector()


for (i in 2:no_rows) {
  temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
  events_c[i] <- temp_events_c
  temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
  risk_c[i] <- temp_risk_c
}

#re-define values for events_t with for loops

temp_events_d <-vector()
temp_risk_d <-vector()

for (j in 2:no_rows) {
  temp_events_d <- ((prob_d[j] * risk_d[j-1]) - (prob_d[j] * miss_d[j-1]) - (prob_d[j-1] * risk_d[j-1]) + (prob_d[j-1] * miss_d[j-1])) / (prob_d[j] - (2 * prob_d[j-1]))
  events_d[j] <- temp_events_d
  temp_risk_d <- risk_d[j-1] - miss_d[j-1] - events_d[j]
  risk_d[j] <- temp_risk_d
}

#calculate total risk, events and expected
total_risk <- risk_c + risk_d

total_events <- events_c + events_d

expected_c <- (risk_c * (total_events/total_risk))

expected_d <- (risk_d * (total_events/total_risk))

#place values into dataframe

df1 <- data.frame(time_vector,prob_c,prob_d, miss_c, miss_d, events_c, events_d, risk_c, risk_d, total_risk, total_events, expected_c, expected_d)

#sum of values
sum_events_C <- sum(events_c)

sum_events_d <- sum(events_d)

sum_expected_c <- sum(expected_c)

sum_expected_d <- sum(expected_d)

#chi_sq formula
chi_sq_combo <- (((sum_events_C - sum_expected_c)^2)/sum_expected_c) + (((sum_events_d - sum_expected_d)^2)/sum_expected_d)


#### end of table calculations before sim

#x <- c(sum_events_C, sum_expected_c, sum_events_d, sum_expected_d)

#x0 <- c(miss_c,miss_d) #inital values


#fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]

#heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 6.5  # heq == 0


#slsqp(x0, fn, gr = NULL, 
 #     hin = NULL, heq = heq)

Rephrasing the comments above, I believe the problem is to use optimization to find two values which yield a target chi-square value. A complication which may cause problems is that there are likely many solutions that yield the target, so it might be necessary to add some other requirement to make the answer unique.

To do this, you need a function of two variables which calculates the square of the difference between the chi-square value using those variables and the target value, and then you minimize that.

For example,

fn2 <- function(x) {
  c <- x[1]
  d <- x[2]
  chisq <- (((c - sum_expected_c)^2)/sum_expected_c) + 
           (((d - sum_expected_d)^2)/sum_expected_d)
  (chisq - 6.5)^2
}
for (i in 1:no_rows) {
  x0 <- c(df1$miss_c[i],df1$miss_d[i]) #initial values
  res <- nloptr::slsqp(x0, fn2)
  miss_c[i] <- res$par[1]
  miss_d[i] <- res$par[2]
}

This gives the same values all 5 times, so I might not have understood you completely.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM