I have a vector of integers which I wish to divide into clusters so that the distance between any two clusters is greater than a lower bound, and within any cluster, the distance between two elements is less than an upper bound.
For example, suppose we have the following vector:
1, 4, 5, 6, 9, 29, 32, 36
And set the aforementioned lower bound and upper bound to 19 and 9 respectively, the two vectors below should be a possible result:
1, 4, 5, 6, 9
29, 32, 36
Thanks to @flodel 's comments, I realized this kind of clustering may be impossible. So I would like to modify the questions a bit:
What are the possible clustering methods if I impose only the between cluster distance lower bound? What are the possible clustering methods if I impose only the within cluster distance upper bound?
What are the possible clustering methods if I impose only the between cluster distance lower bound?
Hierarchical clustering with single linkage :
x <- c(1, 4, 5, 6, 9, 29, 32, 46, 55)
tree <- hclust(dist(x), method = "single")
split(x, cutree(tree, h = 19))
# $`1`
# [1] 1 4 5 6 9
#
# $`2`
# [1] 29 32 46 55
What are the possible clustering methods if I impose only the within cluster distance upper bound?
Hierarchical clustering with complete linkage :
x <- c(1, 4, 5, 6, 9, 20, 26, 29, 32)
tree <- hclust(dist(x), method = "complete")
split(x, cutree(tree, h = 9))
# $`1`
# [1] 1 4 5 6 9
#
# $`2`
# [1] 20
#
# $`3`
# [1] 26 29 32
Here's a simple algorithm that will work, explained conceptually (implementation details omitted):
lower_bound
apart. These mark all the possible cluster boundaries. left_marker
and right_marker
, check if the distance between the element immediately to the right of the left_marker
and the element immediately to the left of the right_marker
is less than upper_bound
apart. Applying this to your example, we get:
EDIT : Original poster relaxed the conditions of the problem.
If you only want to satisfy the lower bound condition:
lower_bound
apart. The following gets you step 2 assuming your vector is already sorted:
# Given
vec <- c(1, 4, 5, 6, 9, 29, 32, 26)
lower_bound <- 19
f <- function(x) {
return(vec[x+1] - vec[x] > lower_bound);
}
indices <- seq(length(vec)-1)
marker_positions <- Position(f, indices)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.