Suppose we have 2 rnorm, rnorm1 = rnorm(100,0,1) and rnorm2 = rnorm(100,1,1) and then we define input = c(rnorm1, rnorm2) .
This question really bother me, could anyone give a favor?
For the simplest case, where you have 2 distributions and know the means of each, you can find the cutpoint by calculating the (log) likelihood for each possible cutpoint:
x = rnorm(100, 0, 1)
y = rnorm(100, 1, 1)
combined = c(x, y)
log_lik = function(cutpoint) {
part1 = combined[1:cutpoint]
part2 = combined[(cutpoint + 1):length(combined)]
sum(dnorm(part1, mean = 0, log = TRUE)) +
sum(dnorm(part2, mean = 1, log = TRUE))
}
res = sapply(1:length(combined), log_lik)
plot(res)
which.max(res)
This is just an ad-hoc solution to the problem though, for more robust statistical procedures you probably want to look at something like a changepoint analysis.
If the population means are unknown, you can use the strucchange
package.
Example assuming a unique breakpoint:
library(strucchange)
set.seed(666)
y <- c(rnorm(100,0,1), rnorm(100,1,1))
bp <- breakpoints(y ~ 1, breaks = 1) # assume a unique breakpoint
bp$breakpoints
# 102
If there's no assumption on the number of breakpoints:
library(strucchange)
set.seed(666)
y <- c(rnorm(100,0,1), rnorm(100,1,1), rnorm(100,0,1))
bp <- breakpoints(y ~ 1, breaks = NULL) # unknown number of breakpoints
bp$breakpoints
# 102, 213
changepoint
is another package for the detection of breakpoints.
strucchange
more generally allows to search breakpoints assuming a linear regression model on the segments (eg it can detect a change of the intercept/slope in the case of the simple linear regression).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.