简体   繁体   English

python 中带有 gekko 的 MLE 应用程序

[英]MLE application with gekko in python

I want to implement MLE (Maximum likelihood estimation) with gekko package in python. Suppose that we have a DataFrame that contains two columns: ['Loss', 'Target'] and it length is equal to 500.我想在 python 中使用gekko package 实现MLE(最大似然估计) 。假设我们有一个包含两列的DataFrame :['Loss', 'Target'] 并且它的长度等于 500。
First we have to import packages that we need:首先我们必须导入我们需要的包:

from gekko import GEKKO
import numpy as np
import pandas as pd

Then we simply create the DataFrame like this:然后我们像这样简单地创建DataFrame

My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
My_DataFrame 

It going to be look like this:它看起来像这样:
在此处输入图像描述

Some components of ['Target'] column should be calculated with a formula that I wrote it right down below in the picture(and the rest of them remains zero. I explained more in continue, please keep reading) so you can see it perfectly. ['Target'] 列的某些组件应该使用我在图片下方写下的公式计算(其中 rest 保持为零。我在继续中解释了更多,请继续阅读)以便您可以完美地看到它. Two main elements of formula are 'Kasi' and 'Betaa'.配方的两个主要元素是“Kasi”和“Betaa”。 I want to find best value for them that maximize sum of My_DataFrame['Target'] .我想为他们找到最大化My_DataFrame['Target']之和的最佳价值。 So you got the idea and what is going to happen!所以你知道了会发生什么!

在此处输入图像描述

Now let me show you how I wrote the code for this purpose.现在让我向您展示我是如何为此目的编写代码的。 First I define my objective function:首先我定义我的目标 function:

def obj_function(Array):
    """
    [Purpose]:
        + it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
    
    [Parameters]:
        + This function gets Array that contains 'Kasi' and 'Betaa'.
        Array[0] represents 'Kasi' and Array[1] represents 'Betaa'

    [returns]:
        + returns a pandas.series.
        actually it returns new components of My_DataFrame["Target"]
    """
    # in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
    # in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item. 
    for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
        My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*(  1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))

    return My_DataFrame["Target"]

if you got confused what's happening in for loop in obj_function function, check picture below, it contains a brief example, and if not: skip this part :如果你对obj_function function 中的for loop发生了什么感到困惑,请查看下面的图片,它包含一个简短的示例,如果没有:跳过这部分:

在此处输入图像描述

Then just we need to go through optimization.那么我们只需要 go 通过优化。 I use gekko package for this purpose.为此,我使用gekko package。 Note that I want to find best values of 'Kasi' and 'Betaa' so I have two main variables and I don't have any kind of constraints: So let's get started:请注意,我想找到 'Kasi' 和 'Betaa' 的最佳值,所以我有两个主要变量并且我没有任何类型的约束:所以让我们开始吧:

# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()

# now i want to specify my variables ('Kasi'  and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd , value = [0.7 , 20])
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'

qw.Maximize(np.sum(obj_function(x)))

And then when I want to solve the optimization with qw.solve() :然后当我想用qw.solve()解决优化问题时:

qw.solve()

But i got this error:但是我得到了这个错误:

Exception: This steady-state IMODE only allows scalar values.例外:此稳态 IMODE 仅允许标量值。

How can I fix this problem?我该如何解决这个问题? (Complete script gathered in next section for the purpose of convenience) (为方便起见,下一节收集了完整的脚本)

from gekko import GEKKO
import numpy as np
import pandas as pd


My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)

def obj_function(Array):
    """
    [Purpose]:
        + it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
    
    [Parameters]:
        + This function gets Array that contains 'Kasi' and 'Betaa'.
        Array[0] represents 'Kasi' and Array[1] represents 'Betaa'

    [returns]:
        + returns a pandas.series.
        actually it returns new components of My_DataFrame["Target"]
    """
    # in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
    # in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item. 
    for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
        My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*(  1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))

    return My_DataFrame["Target"]



# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()

# now i want to specify my variables ('Kasi'  and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'

qw.Maximize(qw.sum(obj_function(x)))

proposed potential script is here:建议的潜在脚本在这里:

from gekko import GEKKO
import numpy as np
import pandas as pd


My_DataFrame = pd.read_excel("[<FILE_PATH_IN_YOUR_MACHINE>]\\Losses.xlsx")
# i'll put link of "Losses.xlsx" file in the end of my explaination
# so you can download it from my google drive.


loss = My_DataFrame["Loss"]
def obj_function(x):
    k,b = x
    target = []

    for iloss in loss:
        if iloss>160:
            t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
            target.append(t)
    return target


qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)

# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi
   
# bounds
k,b = x
k.lower=0.1; k.upper=0.8
b.lower=10;  b.upper=500
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])

python output: python output:

objective function = -1155.4861315885942目标 function = -1155.4861315885942
b = 500.0 b = 500.0
k = 0.1 k = 0.1

note that in python output b is representing "Betaa" and k is representing "Kasi".请注意,在 python output 中, b代表“Betaa”, k代表“Kasi”。
output seems abit strange, so i decide to test it! output 看起来有点奇怪,所以我决定测试一下! for this purpose I used Microsoft Excel Solver !为此,我使用了Microsoft Excel 求解器
(i put the link of excel file at the end of my explaination so you can check it out yourself if you want.) as you can see in picture bellow, optimization by excel has been done and optimal solution has been found successfully (see picture bellow for optimization result ). (我把excel文件的链接放在了我的解释的最后,所以如果你想的话,你可以自己看看。)如下图所示,excel的优化已经完成,并且已经成功找到最佳解决方案(见图下面是优化结果)。
在此处输入图像描述

excel output: excel output:

objective function = -108.21目标 function = -108.21
Betaa = 32.53161贝塔 = 32.53161
Kasi = 0.436246卡西 = 0.436246

as you can see there is huge difference between python output and excel output and seems that excel is performing pretty well!如您所见, python outputexcel output之间存在巨大差异,似乎excel的表现相当不错! so i guess problem still stands and proposed python script is not performing well...所以我想问题仍然存在,建议 python 脚本表现不佳......
Implementation_in_Excel.xls file of Optimization by Microsoft excel application is available here .(also you can see the optimization options in Data tab --> Analysis --> Slover.) Optimization by Microsoft excel 应用程序的Implementation_in_Excel.xls文件可在此处获得。(您还可以在数据选项卡 --> 分析 --> Slover 中查看优化选项。)
data that used for optimization in excel and python are same and it's available here (it's pretty simple and contains 501 rows and 1 column). excel 和 python 中用于优化的数据是相同的,可以在此处找到(非常简单,包含 501 行和 1 列)。
*if you can't download the files, let me know then I'll update them. *如果你不能下载文件,让我知道然后我会更新它们。

qw.Maximize() only sets the objective of the optimization, you still need to call solve() on your model. qw.Maximize()仅设置优化目标,您仍然需要在 model 上调用solve()

The initialization is applying the values of [0.7, 20] to each parameter.初始化将[0.7, 20]的值应用于每个参数。 A scalar should be used to initialize value instead such as:应该使用标量来初始化value ,例如:

x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi

Another issue is that gekko needs to use special functions to perform automatic differentiation for the solvers.另一个问题是gekko需要使用特殊函数来为求解器执行自动微分。 For the objective function, switch to the gekko version of summation as:对于目标 function,切换到求和的gekko版本:

qw.Maximize(qw.sum(obj_function(x)))

If loss is computed by changing the values of x then the objective function haslogical expressions that need special treatment for solution with gradient-based solvers.如果通过更改x的值来计算loss ,则目标 function 具有需要特殊处理的逻辑表达式,以便使用基于梯度的求解器进行求解。 Try using the if3() function for a conditional statement or else slack variables (preferred).尝试将if3() function 用于条件语句或松弛变量(首选)。 The objective function is evaluated once to build a symbolic expressions that are then compiled to byte-code and solved with one of the solvers.目标 function 被评估一次以构建一个符号表达式,然后将其编译为字节码并使用其中一个求解器求解。 The symbolic expressions are found in m.path in the gk0_model.apm file.符号表达式位于m.path文件的gk0_model.apm中。

Response to Edit对编辑的回应

Thanks for posting an edit with the complete code.感谢您发布包含完整代码的编辑。 Here is a potential solution:这是一个潜在的解决方案:

from gekko import GEKKO
import numpy as np
import pandas as pd

loss = np.linspace(-555.795 , 477.841 , 500)
def obj_function(x):
    k,b = x
    target = []

    for iloss in loss:
        if iloss>160:
            t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
            target.append(t)
    return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi
# bounds
k,b = x
k.lower=0.6; k.upper=0.8
b.lower=10;  b.upper=30
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])

The solver reaches bounds at the solution.求解器到达解的边界。 The bounds may need to be widened so that arbitrary limits are not the solution.可能需要扩大界限,这样任意限制就不是解决方案。


Update更新

Here is a final solution.这是最终的解决方案。 That objective function in code had a problem so it should be fixed Here is the correct script:代码中的目标 function 有问题所以应该修复这是正确的脚本:

from gekko import GEKKO
import numpy as np
import pandas as pd

My_DataFrame = pd.read_excel("<FILE_PATH_IN_YOUR_MACHINE>\\Losses.xlsx")
loss = My_DataFrame["Loss"]

def obj_function(x):
    k,b = x
    q = ((-1/k)-1)
    target = []

    for iloss in loss:
        if iloss>160:
            t = qw.log(1/b) + q* ( qw.log(b+k*(iloss-160)) - qw.log(b))
            target.append(t)
    return target

qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)

# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi

qw.Maximize(qw.sum(obj_function(x)))
qw.solve()
print('Kasi = ',x[0].value)
print('Betaa = ',x[1].value)

Output: Output:

 The final value of the objective function is  108.20609317143486
 
 ---------------------------------------------------
 Solver         :  IPOPT (v3.12)
 Solution time  :  0.031200000000000006 sec
 Objective      :  108.20609317143486
 Successful solution
 ---------------------------------------------------
 

Kasi =  [0.436245842]
Betaa =  [32.531632983]

Results are close to the optimization result from Microsoft Excel.结果接近 Microsoft Excel 的优化结果。

If I can see correctly, My_DataFrame has been defined in the global scope.如果我没看错的话, My_DataFrame已经定义在全局scope中了。
The problem is that the obj_funtion tries to access it ( successful ) and then, modify it's value ( fails ) This is because you can't modify global variables from a local scope by default.问题是obj_funtion尝试访问它(成功)然后修改它的值(失败)这是因为默认情况下您不能从本地 scope 修改全局变量。

Fix:使固定:

At the beginning of the obj_function , add a line:obj_function的开头,添加一行:

def obj_function(Array):
    # comments
    global My_DataFrame
    for item .... # remains same

This should fix your problem.这应该可以解决您的问题。

Additional Note:附加说明:

If you just wanted to access My_DataFrame , it would work without any errors and you don't need to add the global keyword如果你只是想访问My_DataFrame ,它会没有任何错误地工作,你不需要添加global关键字

Also, just wanted to appreciate the effort you put into this.另外,只是想感谢您为此付出的努力。 There's a proper explanation of what you want to do, relevant background information, an excellent diagram ( Whiteboard is pretty great too), and even a minimal working example.有关于您想做什么的正确解释、相关背景信息、出色的图表( Whiteboard也非常棒),甚至还有一个最小的工作示例。 This should be how all SO questions are, it would make everyone's lives easier这应该是所有 SO 问题的方式,它会让每个人的生活更轻松

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM