R 中聚类分析的转移矩阵

Question

I have the following dataset, where the column clust is the initial cluster and lt_clust is the resulting cluster after some time:我有以下数据集，其中列集群是初始集群，而 lt_clust 是一段时间后的结果集群：

dataset <- data.frame(Id = c(101, 102, 103, 104, 105, 106, 107, 108, 
                             109, 110, 111, 112, 113, 114), 
                      clust = c("k1", "k1", "k1", "k1","k1", "k2", "k2", 
                                "k2", "k2", "k2", "k3", "k3", "k3", "k3"), 
                      lt_clust = c("k2", "k1", "k1", "k1", "k1", "k2", "k3", 
                                   "k1", "k2", "k2", "k3", "k3", "k1", "k3"),
                      stringsAsFactors = FALSE)

Now I want to test how much I was accurate when assigning the final cluster, so the expected result is:现在我想测试我在分配最终集群时的准确程度，所以预期的结果是：

  clust lt_clust rate
  <fct> <fct>    <dbl>
1 k1    k1         0.8
2 k1    k2         0.2
3 k1    k3           0
4 k2    k1         0.2
5 k2    k2         0.6
6 k2    k3         0.2
7 k3    k1        0.25
8 k3    k2           0
9 k3    k3        0.75

This was my first attempt:这是我的第一次尝试：

dataset %>% 
  mutate(clust = as.factor(clust),
         lt_clust = as.factor(lt_clust),
         tick = 1) %>%
  group_by(clust, lt_clust, .drop = FALSE) %>%
  summarise(total = sum(tick)) %>%
  ungroup() %>%
  group_by(clust, ) %>%
  summarise(rate = total / sum(total))

But I fail to capture the lt_clust column:但我未能捕获 lt_clust 列：

  clust  rate
  <fct> <dbl>
1 k1     0.8 
2 k1     0.2 
3 k1     0   
4 k2     0.2 
5 k2     0.6 
6 k2     0.2 
7 k3     0.25
8 k3     0   
9 k3     0.75

And when I try this, the result is wrong too:当我尝试这个时，结果也是错误的：

dataset %>% 
  mutate(clust = as.factor(clust),
         lt_clust = as.factor(lt_clust),
         tick = 1) %>%
  group_by(clust, lt_clust, .drop = FALSE) %>%
  summarise(total = sum(tick),
            rate = total / sum(total))  
  clust lt_clust total  rate
  <fct> <fct>    <dbl> <dbl>
1 k1    k1           4     1
2 k1    k2           1     1
3 k1    k3           0   NaN
4 k2    k1           1     1
5 k2    k2           3     1
6 k2    k3           1     1
7 k3    k1           1     1
8 k3    k2           0   NaN
9 k3    k3           3     1

Please, could you help me to spot what I am doing wrong in the code?拜托，你能帮我找出我在代码中做错了什么吗？ I try to do it using the dplyr package.我尝试使用 dplyr package 来做到这一点。

Answer 1

From your first attempt, just add lt_clust alone to summarise() :从您的第一次尝试开始，只需将lt_clust单独添加到summarise() ：

dataset %>% 
    mutate(clust = as.factor(clust),
           lt_clust = as.factor(lt_clust),
           tick = 1) %>%
    group_by(clust, lt_clust, .drop = FALSE) %>%
    summarise(total = sum(tick)) %>%
    ungroup() %>%
    group_by(clust, ) %>%
        summarise(lt_clust, rate = total / sum(total))

# A tibble: 9 × 3
# Groups:   clust [3]
  clust lt_clust  rate
  <fct> <fct>    <dbl>
1 k1    k1        0.8 
2 k1    k2        0.2 
3 k1    k3        0   
4 k2    k1        0.2 
5 k2    k2        0.6 
6 k2    k3        0.2 
7 k3    k1        0.25
8 k3    k2        0   
9 k3    k3        0.75

R 中聚类分析的转移矩阵

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-12-06 21:25:26

R 中聚类分析的转移矩阵

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-12-06 21:25:26

解决方案1
1 已采纳 2021-12-06 21:25:26