简体   繁体   English

当初始值未直接包含在目标函数中时,如何在R中编写优化代码?

[英]How do I write optimization code in R when the initial values are not directly contained within the objective function?

I'm trying to write a piece of optimization code in R to calculate a set of unknown values for a biological question concerning drosophila flies. 我正在尝试在R中编写一段优化代码,以计算关于果蝇蝇的生物学问题的一组未知值。

The dataframe is composed of 13 columns (just showing 9 in code below for clarity) with varying numbers of rows. 数据框由13列组成(为清楚起见,下面仅在代码中显示9列),行数有所不同。 The first three columns contain collected data and the rest of the columns are calculated using various formulas. 前三列包含收集的数据,其余各列使用各种公式计算。 Two of the columns, Missing_C and Missing_D are initially populated with empty data and in the optimization problem represent the initial values. 最初,两个列Missing_C和Missing_D填充了空数据,并且在优化问题中代表初始值。

  Time.min. Prob_C Prob_D Miss_C Miss_D Event_C Event_D Risk_C Risk_D
1         0   1.00   1.00         0         0    0.00    0.00  86.00  78.00
2         5   0.98   0.97         0         0    1.93    1.98  84.07  76.02
3        16   0.84   0.95         0         0   10.67    1.90  73.40  74.12
4        17   0.50   0.75         0         0   21.02   12.85  52.38  61.27
5        20   0.30   0.50         0         0   14.97   15.32  37.42  45.95

As an example of the some of the formulas used, Event_C and Risk_C are calculated with a for loop as follows: 作为所用某些公式的示例,Event_C和Risk_C通过for循环计算如下:

#define values for events_c and risk_c with for loops`

temp_events_c <-vector()
temp_risk_c <-vector()


for (i in 2:no_rows) {
  temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
  events_c[i] <- temp_events_c
  temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
  risk_c[i] <- temp_risk_c
}

From this data, I also have a single, collected, value (9.1 in this instance) which relates to the values in the table. 从这些数据中,我还有一个单独的,收集的值(在本例中为9.1),它与表中的值相关。 The following function defines the relationship to this value to columns Event_C, Event_D and two columns not shown in the above, Expected_C and Expected_D where the sums of those columns are represented by x[1], x[2], x[3], x[4]: 以下函数在Event_C,Event_D列和上面未显示的两列Expected_C和Expected_D中定义了与该值的关系,其中这些列的总和由x [1],x [2],x [3]表示, X [4]:

fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]

I then would like to use a minimization algorithm ie slsqp from nloptr to calculate the values in the Miss_C and Miss_D which ultimately satisfy this single value. 然后我想用一个最小化算法即slsqpnloptr计算在Miss_C和Miss_D最终满足该单值的值。 The extra code for the optimization would be something like this: 优化的额外代码如下所示:

x0 <- c(Miss_C,Miss_D)

heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 9.1  # heq == 0

slsqp(x0, fn, gr = NULL, 
      hin = NULL, heq = heq)

Obviously, that doesn't work as the initial values are not directly contained within the function that needs to be solved and that is the point I am stuck at! 显然,这是行不通的,因为初始值没有直接包含在需要解决的函数中,这就是我要坚持的重点! I'm not sure if this is an optimization problem as such or more of a general R coding question - either way, any help would be much appreciated. 我不确定这是否是一个优化问题或更多的一般R编码问题-无论哪种方式,任何帮助将不胜感激。

Cheers, Andrew 干杯,安德鲁

* Edit - complete code added as per request * *编辑-根据请求添加完整的代码*

#input variables

time_vector <- c(0,5,16,17,20)

prob_c <- c(1,0.977,0.835,0.5,0.30)

prob_d <- c(1,0.974,0.949,0.75,0.50)

miss_c <- c(0,0,0,0,0)

miss_d <- c(0,0,0,0,0)

#get number of rows

no_rows <- length(time_vector)

#fill events columns with dummy data

events_c <- c(0:(no_rows - 1))
events_d <- c(0:(no_rows - 1))

#define starting number at risk

risk_c_t0 <- 86
risk_d_t0 <- 78


#add t0 risk to each column

risk_c <- risk_c_t0
risk_d <-risk_d_t0

#fill risk columns with dummy data

risk_c[2:no_rows] <- c(2:no_rows)
risk_d[2:no_rows] <- c(2:no_rows)


#re-define values for events_c and risk_c with for loops

temp_events_c <-vector()
temp_risk_c <-vector()


for (i in 2:no_rows) {
  temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
  events_c[i] <- temp_events_c
  temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
  risk_c[i] <- temp_risk_c
}

#re-define values for events_t with for loops

temp_events_d <-vector()
temp_risk_d <-vector()

for (j in 2:no_rows) {
  temp_events_d <- ((prob_d[j] * risk_d[j-1]) - (prob_d[j] * miss_d[j-1]) - (prob_d[j-1] * risk_d[j-1]) + (prob_d[j-1] * miss_d[j-1])) / (prob_d[j] - (2 * prob_d[j-1]))
  events_d[j] <- temp_events_d
  temp_risk_d <- risk_d[j-1] - miss_d[j-1] - events_d[j]
  risk_d[j] <- temp_risk_d
}

#calculate total risk, events and expected
total_risk <- risk_c + risk_d

total_events <- events_c + events_d

expected_c <- (risk_c * (total_events/total_risk))

expected_d <- (risk_d * (total_events/total_risk))

#place values into dataframe

df1 <- data.frame(time_vector,prob_c,prob_d, miss_c, miss_d, events_c, events_d, risk_c, risk_d, total_risk, total_events, expected_c, expected_d)

#sum of values
sum_events_C <- sum(events_c)

sum_events_d <- sum(events_d)

sum_expected_c <- sum(expected_c)

sum_expected_d <- sum(expected_d)

#chi_sq formula
chi_sq_combo <- (((sum_events_C - sum_expected_c)^2)/sum_expected_c) + (((sum_events_d - sum_expected_d)^2)/sum_expected_d)


#### end of table calculations before sim

#x <- c(sum_events_C, sum_expected_c, sum_events_d, sum_expected_d)

#x0 <- c(miss_c,miss_d) #inital values


#fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]

#heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 6.5  # heq == 0


#slsqp(x0, fn, gr = NULL, 
 #     hin = NULL, heq = heq)

Rephrasing the comments above, I believe the problem is to use optimization to find two values which yield a target chi-square value. 改写上面的评论,我相信问题是要使用优化来找到两个产生目标卡方值的值。 A complication which may cause problems is that there are likely many solutions that yield the target, so it might be necessary to add some other requirement to make the answer unique. 一个可能引起问题的麻烦是,可能有许多解决方案可以达到目标,因此可能有必要添加一些其他要求以使答案唯一。

To do this, you need a function of two variables which calculates the square of the difference between the chi-square value using those variables and the target value, and then you minimize that. 为此,您需要一个包含两个变量的函数,该函数使用这些变量和目标值计算卡方值与目标值之间的差的平方,然后将其最小化。

For example, 例如,

fn2 <- function(x) {
  c <- x[1]
  d <- x[2]
  chisq <- (((c - sum_expected_c)^2)/sum_expected_c) + 
           (((d - sum_expected_d)^2)/sum_expected_d)
  (chisq - 6.5)^2
}
for (i in 1:no_rows) {
  x0 <- c(df1$miss_c[i],df1$miss_d[i]) #initial values
  res <- nloptr::slsqp(x0, fn2)
  miss_c[i] <- res$par[1]
  miss_d[i] <- res$par[2]
}

This gives the same values all 5 times, so I might not have understood you completely. 这会给所有5次相同的值,所以我可能不完全了解您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何编写引用R优化中目标的约束 - How to write a constraint that references the Objective in R optimization 如何编写 R 代码以通过特定单元格直接从 excel 读取? - How do I write R code to read directly from excel by specific cells? R给定目标函数的优化 - R Optimization given objective function 在Shiny中,如何直接将源代码替换为R函数调用或语句? - In Shiny, how do I directly substitute source code into R function calls or statements? 在R中运行优化时并行调用目标函数 - Calling objective function in parallel when running optimization in R 如何在R中编写摘要功能? - How do I write a summarize function in R? 如何基于[in R]中包含的值的顺序对向量列表进行排序 - How to order a list of vectors based on the order of values contained within [in R] R:如何编写一个函数来对特定的列值求和并在满足特定条件时报告行号? - R: How to write a function to do summation of a particular column values and report the row numbers when it meet certain criteria? 当我没有真实值时,如何编写自己的自定义损失函数? - How do I write my own custom loss function when I do not have the true values? 如何在 r 中编写一个函数来对记录进行计算? - How do I write a function in r to do cacluations on a record?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM