为现有列值创建新顺序而不重新排序数据框中的行 - R

Question

I have some results cluster labels from kmeans done on different ids (reprex example below).我有一些来自 kmeans 的结果聚类标签，这些标签在不同的 id 上完成（下面的代表示例）。 the problem is the kmeans clusters codes are not ordered consistently across ids although all ids have 3 clusters.问题是尽管所有 id 都有 3 个集群，但 kmeans 集群代码在 id 之间的排序不一致。

reprex = data.frame(id = rep(1:2, each = 41, 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2))

reprex
   id v1 cluster
1  1  1       2
2  1  2       2
3  1  3       1
4  1  4       3
5  2  1       3
6  2  2       1
7  2  3       2
8  2  4       2

what I want is that the variable cluster should always start with 1 within each ID.我想要的是变量簇应该总是在每个 ID 中以 1 开头。 Note I don't want to reorder that dataframe by cluster, the order needs to remain the same.注意我不想按集群重新排序该数据帧，顺序需要保持不变。 so the desired result would be:所以想要的结果是：

reprex_desired<- data.frame(id = rep(1:2, each = 4), 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2),
           what_iWant = c(1,1,2,3,1,2,3,3))

reprex_desired
  id v1 cluster what_iWant
1  1  1       2          1
2  1  2       2          1
3  1  3       1          2
4  1  4       3          3
5  2  1       3          1
6  2  2       1          2
7  2  3       2          3
8  2  4       2          3

Answer 1

We can use match after grouping by 'id'我们可以在按 'id' 分组后使用match

library(dplyr)
reprex <- reprex %>%
     group_by(id) %>% 
     mutate(what_IWant = match(cluster, unique(cluster))) %>%
     ungroup

-output -输出

reprex
# A tibble: 8 × 4
     id    v1 cluster what_IWant
  <int> <int>   <dbl>      <int>
1     1     1       2          1
2     1     2       2          1
3     1     3       1          2
4     1     4       3          3
5     2     1       3          1
6     2     2       1          2
7     2     3       2          3
8     2     4       2          3

Answer 2

Here is a version with cumsum combined with lag :这是cumsum与lag结合的版本：

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(what_i_want = cumsum(cluster != lag(cluster, def = first(cluster)))+1)

     id    v1 cluster what_i_want
  <int> <int>   <dbl>       <dbl>
1     1     1       2           1
2     1     2       2           1
3     1     3       1           2
4     1     4       3           3
5     2     1       3           1
6     2     2       1           2
7     2     3       2           3
8     2     4       2           3

为现有列值创建新顺序而不重新排序数据框中的行 - R

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-06-09 18:00:21

解决方案2
2 2022-06-09 18:15:36

为现有列值创建新顺序而不重新排序数据框中的行 - R

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-06-09 18:00:21

解决方案2 2 2022-06-09 18:15:36

解决方案1
3 已采纳 2022-06-09 18:00:21

解决方案2
2 2022-06-09 18:15:36