简体   繁体   中英

How to find the cutting points of normal distributions (ordered data case) - R

Suppose we have 2 rnorm, rnorm1 = rnorm(100,0,1) and rnorm2 = rnorm(100,1,1) and then we define input = c(rnorm1, rnorm2) .

  1. How can we find the cutting point(in this case is point 100) of these 2 distributions and keep the data in input ordered (not changing the order at all)?
  2. Further more, if we have multiple normal distributions(say more than 3), how can we do the same thing but not defining the number of distributions ?

This question really bother me, could anyone give a favor?

For the simplest case, where you have 2 distributions and know the means of each, you can find the cutpoint by calculating the (log) likelihood for each possible cutpoint:

x = rnorm(100, 0, 1)
y = rnorm(100, 1, 1)
combined = c(x, y)

log_lik = function(cutpoint) {
    part1 = combined[1:cutpoint]
    part2 = combined[(cutpoint + 1):length(combined)]
   sum(dnorm(part1, mean = 0, log = TRUE)) +
    sum(dnorm(part2, mean = 1, log = TRUE))
}

res = sapply(1:length(combined), log_lik)
plot(res)
which.max(res)

This is just an ad-hoc solution to the problem though, for more robust statistical procedures you probably want to look at something like a changepoint analysis.

If the population means are unknown, you can use the strucchange package.

Example assuming a unique breakpoint:

library(strucchange)
set.seed(666)
y <- c(rnorm(100,0,1), rnorm(100,1,1))
bp <- breakpoints(y ~ 1, breaks = 1) # assume a unique breakpoint
bp$breakpoints
# 102

If there's no assumption on the number of breakpoints:

library(strucchange)
set.seed(666)
y <- c(rnorm(100,0,1), rnorm(100,1,1), rnorm(100,0,1))
bp <- breakpoints(y ~ 1, breaks = NULL) # unknown number of breakpoints
bp$breakpoints
# 102, 213

changepoint is another package for the detection of breakpoints.

strucchange more generally allows to search breakpoints assuming a linear regression model on the segments (eg it can detect a change of the intercept/slope in the case of the simple linear regression).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM