简体   繁体   中英

Assigning clusters/groups based on two sequential variables in R

Context : I have some spatial point data (ie lon/lat coordinates), and each point is associated with a date. I've clustered points that are close together, but I now want to split these clusters into groups so that if sorted by date the clusters are sequential and grouped together. Dates can have gaps, and I only want to slit when an observation fully divides a group, ie it's not just on the edge

Essentially, given the below cluster and day fields I want to generate desired .

   cluster day desired
1        1   1       1
2        1   1       1
3        1   2       1
4        1   4       1
5        2   6       2
6        2   7       2
7        2   8       2
8        1   8       3
9        3   9       4
10       3  12       4
11       3  12       4
12       2  12       5
13       2  14       5
14       3  18       6
15       3  19       6

Here's a complete example, note that the spatial coordinates are essentially irrelevant, I've just included them for completeness. Also, in my actual dataset day is a date object, but I've used an integer for simplicity.

library(ggplot2)
pts <- data.frame(rbind(
  cbind(lon = rnorm(5, 0, 0.1), lat = rnorm(5, 0, 0.1), 
        day = c(1, 1, 2, 4, 8)),
  cbind(lon = rnorm(5, 1, 0.1), lat = rnorm(5, 1, 0.1), 
        day = c(6, 7, 8, 12, 14)),
  cbind(lon = rnorm(5, 1, 0.1), lat = rnorm(5, 0, 0.1), 
        day = c(9, 12, 12, 18, 19))
))
hc <- hclust(dist(pts[c("lon", "lat")]))
pts$cluster <- cutree(hc, k = 3)
ggplot(pts) +
  geom_text(aes(lat, lon, label = day, col = as.factor(cluster)))

第一情节

The grouping I want is this:

pts$desired <- c(1, 1, 1, 1, 3, 
                 2, 2, 2, 5, 5,
                 4, 4, 4, 6, 6)
ggplot(pts) +
  geom_text(aes(lat, lon, label = day, col = as.factor(desired)))

第二个情节

This solution comes courtesy of @docendodiscimus in the comments to the original question.

library(dplyr)
pts <- pts %>% 
  arrange(day, desc(cluster)) %>% 
  mutate(new_cluster = cumsum(c(1L, diff(cluster) != 0)))
all.equal(pts$desired, pts$new_cluster)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM