简体   繁体   English

R 优化对数似然

[英]R optimise log-likelihood

I have a function that takes in lambda and a data sample, and finds the corresponding log-likelihood value for each data point:我有一个函数接受 lambda 和一个数据样本,并为每个数据点找到相应的对数似然值:

data <- rpois(n=25, lambda=4)
generateLogLikelihood <- function(lambda, y){
  return(dpois(y, lambda, log=TRUE))
}

LogLikelihood = generateLogLikelihood (4,data)
LogLikelihood

I'm asking for a solution to fit this requirement:我要求一个解决方案来满足这个要求:

I need to use the optimise function to return the value of lambda which maximises the log-likelihood for the given data sample, but I'm getting stuck (this is my first attempt at using optimise).我需要使用优化函数来返回 lambda 的值,该值最大化给定数据样本的对数似然,但我被卡住了(这是我第一次尝试使用优化)。 I need to adapt the function to only take 'data' as an input.我需要调整函数以仅将“数据”作为输入。

Where I'm getting stuck: I'm not sure how/when to apply optimise... given the requirement that my function only takes one input (the data - now called newdata), do I need to optimise outside of the function (but my function requires lambda values), so I'm not sure how to do this.我陷入困境的地方:我不确定如何/何时应用优化...考虑到我的函数只接受一个输入(数据 - 现在称为 newdata)的要求,我是否需要在函数之外进行优化(但我的函数需要 lambda 值),所以我不知道该怎么做。

My current code, which represents 2 separate parts that I don't know how to combine (or may be entirely wrong), is below:我当前的代码,代表我不知道如何组合(或可能完全错误)的 2 个独立部分,如下所示:

newdata <- c(23,16,18,14,19,20,12,15,15,21)
newlambdas <- seq(min(newdata),max(newdata),0.5)

generateLogLikelihoodNew <- function(y){
  return(dpois(y, lambda, log=TRUE))
}

LogLikelihood = optimise(generateLogLikelihoodNew,newdata,lower = min(newlambdas), upper = max(newlambdas), maximum = TRUE)
LogLikelihood

If you only want to check which of the provided lambda's returns the best fit you can do如果您只想检查提供的 lambda 返回中的哪一个最适合您

generateLogLikelihoodNew <- function(y){
  -sum(dpois(newdata, y, log=TRUE))
}
which.min(lapply(newlambdas,generateLogLikelihoodNew))

If however you want to find such value of lambda then you do not need to provide a lambda sequence vector但是,如果您想找到这样的 lambda 值,则不需要提供 lambda 序列向量

optimise(
  function(x){-sum(dpois(newdata,x,log=TRUE))},
  c(0,100)
)

$minimum
[1] 17.3

$objective
[1] 26.53437

There are several problems here:这里有几个问题:

  1. the log likelihood function defined in the question is only valid for a scalar y value.问题中定义的对数似然函数仅对标量 y 值有效。 The log likelihood function for a vector y is the sum of the log likelihoods of the individual y values.向量 y 的对数似然函数是各个 y 值的对数似然之和。 Add sum to the definition.将 sum 添加到定义中。
  2. the default for optimize is to minimize but to use the log likelihood as an objective we need to maximize so specify maximum=TRUE as an argument to optimize (or else pass the negative log likelihood function).优化的默认值是最小化,但使用对数似然作为我们需要最大化的目标,因此指定maximum=TRUE 作为优化的参数(或者传递负对数似然函数)。
  3. y needs to be passed to the log likelihood function. y 需要传递给对数似然函数。 This can be done by specifying it as an argument to optimize.这可以通过将其指定为要优化的参数来完成。
  4. Although it is not wrong to specify lower and upper as done in the question it is a bit shorter to pass range(newdata) to the interval argument of optimize.尽管在问题中指定下限和上限并没有错,但将 range(newdata) 传递给优化的区间参数要短一些。
  5. although using a long name such as generateLogLikelihood is not wrong it makes it hard to read and can make the code run off the end.尽管使用像 generateLogLikelihood 这样的长名称并没有错,但它会使阅读变得困难并且可能使代码运行到最后。 The word generate really adds nothing.生成这个词真的什么都没有。 I would choose a better name.我会选择一个更好的名字。 Often for scientific code it is read in conjunction with its mathematical formula.通常对于科学代码,它与数学公式一起阅读。 Suppose that in this case the formula used ll or LL.假设在这种情况下,公式使用了 ll 或 LL。 ll is a bit hard to read since a lower case L and a one look nearly the same so we could use LL or if you really want to use write it out shorten it to logLikelihood. ll 有点难读,因为小写的 L 和 one 看起来几乎一样,所以我们可以使用 LL,或者如果你真的想使用写出来,将它缩短为 logLikelihood。 Furthermore, the variable named logLikelihood in the code is not the log likelihood.此外,代码中名为 logLikelihood 的变量不是对数似然。 It is a list consisting of two components which represent the value of lambda and the the objective at the optimum.它是一个由代表 lambda 值和最优目标的两个组件组成的列表。 Clearly there is a certain amount of discretion in choosing names and your opinion may differ from mine but I found it awkward dealing with such long variable names.显然,在选择名称时有一定程度的自由裁量权,您的意见可能与我的不同,但我发现处理如此长的变量名称很尴尬。

Thus we have:因此我们有:

LL <- function(lambda, y) sum(dpois(y, lambda, log = TRUE))
optimize(LL, range(newdata), y = newdata, maximum = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM