[英]How do I write optimization code in R when the initial values are not directly contained within the objective function?
I'm trying to write a piece of optimization code in R to calculate a set of unknown values for a biological question concerning drosophila flies. 我正在尝试在R中编写一段优化代码,以计算关于果蝇蝇的生物学问题的一组未知值。
The dataframe is composed of 13 columns (just showing 9 in code below for clarity) with varying numbers of rows. 数据框由13列组成(为清楚起见,下面仅在代码中显示9列),行数有所不同。 The first three columns contain collected data and the rest of the columns are calculated using various formulas.
前三列包含收集的数据,其余各列使用各种公式计算。 Two of the columns, Missing_C and Missing_D are initially populated with empty data and in the optimization problem represent the initial values.
最初,两个列Missing_C和Missing_D填充了空数据,并且在优化问题中代表初始值。
Time.min. Prob_C Prob_D Miss_C Miss_D Event_C Event_D Risk_C Risk_D
1 0 1.00 1.00 0 0 0.00 0.00 86.00 78.00
2 5 0.98 0.97 0 0 1.93 1.98 84.07 76.02
3 16 0.84 0.95 0 0 10.67 1.90 73.40 74.12
4 17 0.50 0.75 0 0 21.02 12.85 52.38 61.27
5 20 0.30 0.50 0 0 14.97 15.32 37.42 45.95
As an example of the some of the formulas used, Event_C and Risk_C are calculated with a for loop as follows: 作为所用某些公式的示例,Event_C和Risk_C通过for循环计算如下:
#define values for events_c and risk_c with for loops`
temp_events_c <-vector()
temp_risk_c <-vector()
for (i in 2:no_rows) {
temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
events_c[i] <- temp_events_c
temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
risk_c[i] <- temp_risk_c
}
From this data, I also have a single, collected, value (9.1 in this instance) which relates to the values in the table. 从这些数据中,我还有一个单独的,收集的值(在本例中为9.1),它与表中的值相关。 The following function defines the relationship to this value to columns Event_C, Event_D and two columns not shown in the above, Expected_C and Expected_D where the sums of those columns are represented by x[1], x[2], x[3], x[4]:
以下函数在Event_C,Event_D列和上面未显示的两列Expected_C和Expected_D中定义了与该值的关系,其中这些列的总和由x [1],x [2],x [3]表示, X [4]:
fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]
I then would like to use a minimization algorithm ie slsqp
from nloptr
to calculate the values in the Miss_C and Miss_D which ultimately satisfy this single value. 然后我想用一个最小化算法即
slsqp
从nloptr
计算在Miss_C和Miss_D最终满足该单值的值。 The extra code for the optimization would be something like this: 优化的额外代码如下所示:
x0 <- c(Miss_C,Miss_D)
heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 9.1 # heq == 0
slsqp(x0, fn, gr = NULL,
hin = NULL, heq = heq)
Obviously, that doesn't work as the initial values are not directly contained within the function that needs to be solved and that is the point I am stuck at! 显然,这是行不通的,因为初始值没有直接包含在需要解决的函数中,这就是我要坚持的重点! I'm not sure if this is an optimization problem as such or more of a general R coding question - either way, any help would be much appreciated.
我不确定这是否是一个优化问题或更多的一般R编码问题-无论哪种方式,任何帮助将不胜感激。
Cheers, Andrew 干杯,安德鲁
* Edit - complete code added as per request * *编辑-根据请求添加完整的代码*
#input variables
time_vector <- c(0,5,16,17,20)
prob_c <- c(1,0.977,0.835,0.5,0.30)
prob_d <- c(1,0.974,0.949,0.75,0.50)
miss_c <- c(0,0,0,0,0)
miss_d <- c(0,0,0,0,0)
#get number of rows
no_rows <- length(time_vector)
#fill events columns with dummy data
events_c <- c(0:(no_rows - 1))
events_d <- c(0:(no_rows - 1))
#define starting number at risk
risk_c_t0 <- 86
risk_d_t0 <- 78
#add t0 risk to each column
risk_c <- risk_c_t0
risk_d <-risk_d_t0
#fill risk columns with dummy data
risk_c[2:no_rows] <- c(2:no_rows)
risk_d[2:no_rows] <- c(2:no_rows)
#re-define values for events_c and risk_c with for loops
temp_events_c <-vector()
temp_risk_c <-vector()
for (i in 2:no_rows) {
temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
events_c[i] <- temp_events_c
temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
risk_c[i] <- temp_risk_c
}
#re-define values for events_t with for loops
temp_events_d <-vector()
temp_risk_d <-vector()
for (j in 2:no_rows) {
temp_events_d <- ((prob_d[j] * risk_d[j-1]) - (prob_d[j] * miss_d[j-1]) - (prob_d[j-1] * risk_d[j-1]) + (prob_d[j-1] * miss_d[j-1])) / (prob_d[j] - (2 * prob_d[j-1]))
events_d[j] <- temp_events_d
temp_risk_d <- risk_d[j-1] - miss_d[j-1] - events_d[j]
risk_d[j] <- temp_risk_d
}
#calculate total risk, events and expected
total_risk <- risk_c + risk_d
total_events <- events_c + events_d
expected_c <- (risk_c * (total_events/total_risk))
expected_d <- (risk_d * (total_events/total_risk))
#place values into dataframe
df1 <- data.frame(time_vector,prob_c,prob_d, miss_c, miss_d, events_c, events_d, risk_c, risk_d, total_risk, total_events, expected_c, expected_d)
#sum of values
sum_events_C <- sum(events_c)
sum_events_d <- sum(events_d)
sum_expected_c <- sum(expected_c)
sum_expected_d <- sum(expected_d)
#chi_sq formula
chi_sq_combo <- (((sum_events_C - sum_expected_c)^2)/sum_expected_c) + (((sum_events_d - sum_expected_d)^2)/sum_expected_d)
#### end of table calculations before sim
#x <- c(sum_events_C, sum_expected_c, sum_events_d, sum_expected_d)
#x0 <- c(miss_c,miss_d) #inital values
#fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]
#heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 6.5 # heq == 0
#slsqp(x0, fn, gr = NULL,
# hin = NULL, heq = heq)
Rephrasing the comments above, I believe the problem is to use optimization to find two values which yield a target chi-square value. 改写上面的评论,我相信问题是要使用优化来找到两个产生目标卡方值的值。 A complication which may cause problems is that there are likely many solutions that yield the target, so it might be necessary to add some other requirement to make the answer unique.
一个可能引起问题的麻烦是,可能有许多解决方案可以达到目标,因此可能有必要添加一些其他要求以使答案唯一。
To do this, you need a function of two variables which calculates the square of the difference between the chi-square value using those variables and the target value, and then you minimize that. 为此,您需要一个包含两个变量的函数,该函数使用这些变量和目标值计算卡方值与目标值之间的差的平方,然后将其最小化。
For example, 例如,
fn2 <- function(x) {
c <- x[1]
d <- x[2]
chisq <- (((c - sum_expected_c)^2)/sum_expected_c) +
(((d - sum_expected_d)^2)/sum_expected_d)
(chisq - 6.5)^2
}
for (i in 1:no_rows) {
x0 <- c(df1$miss_c[i],df1$miss_d[i]) #initial values
res <- nloptr::slsqp(x0, fn2)
miss_c[i] <- res$par[1]
miss_d[i] <- res$par[2]
}
This gives the same values all 5 times, so I might not have understood you completely. 这会给所有5次相同的值,所以我可能不完全了解您。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.