简体   繁体   English

R 中聚类分析的转移矩阵

[英]Transition matrix for cluster analysis in R

I have the following dataset, where the column clust is the initial cluster and lt_clust is the resulting cluster after some time:我有以下数据集,其中列集群是初始集群,而 lt_clust 是一段时间后的结果集群:

dataset <- data.frame(Id = c(101, 102, 103, 104, 105, 106, 107, 108, 
                             109, 110, 111, 112, 113, 114), 
                      clust = c("k1", "k1", "k1", "k1","k1", "k2", "k2", 
                                "k2", "k2", "k2", "k3", "k3", "k3", "k3"), 
                      lt_clust = c("k2", "k1", "k1", "k1", "k1", "k2", "k3", 
                                   "k1", "k2", "k2", "k3", "k3", "k1", "k3"),
                      stringsAsFactors = FALSE)

Now I want to test how much I was accurate when assigning the final cluster, so the expected result is:现在我想测试我在分配最终集群时的准确程度,所以预期的结果是:

  clust lt_clust rate
  <fct> <fct>    <dbl>
1 k1    k1         0.8
2 k1    k2         0.2
3 k1    k3           0
4 k2    k1         0.2
5 k2    k2         0.6
6 k2    k3         0.2
7 k3    k1        0.25
8 k3    k2           0
9 k3    k3        0.75

This was my first attempt:这是我的第一次尝试:

dataset %>% 
  mutate(clust = as.factor(clust),
         lt_clust = as.factor(lt_clust),
         tick = 1) %>%
  group_by(clust, lt_clust, .drop = FALSE) %>%
  summarise(total = sum(tick)) %>%
  ungroup() %>%
  group_by(clust, ) %>%
  summarise(rate = total / sum(total))

But I fail to capture the lt_clust column:但我未能捕获 lt_clust 列:

  clust  rate
  <fct> <dbl>
1 k1     0.8 
2 k1     0.2 
3 k1     0   
4 k2     0.2 
5 k2     0.6 
6 k2     0.2 
7 k3     0.25
8 k3     0   
9 k3     0.75

And when I try this, the result is wrong too:当我尝试这个时,结果也是错误的:

dataset %>% 
  mutate(clust = as.factor(clust),
         lt_clust = as.factor(lt_clust),
         tick = 1) %>%
  group_by(clust, lt_clust, .drop = FALSE) %>%
  summarise(total = sum(tick),
            rate = total / sum(total))  
  clust lt_clust total  rate
  <fct> <fct>    <dbl> <dbl>
1 k1    k1           4     1
2 k1    k2           1     1
3 k1    k3           0   NaN
4 k2    k1           1     1
5 k2    k2           3     1
6 k2    k3           1     1
7 k3    k1           1     1
8 k3    k2           0   NaN
9 k3    k3           3     1

Please, could you help me to spot what I am doing wrong in the code?拜托,你能帮我找出我在代码中做错了什么吗? I try to do it using the dplyr package.我尝试使用 dplyr package 来做到这一点。

From your first attempt, just add lt_clust alone to summarise() :从您的第一次尝试开始,只需将lt_clust单独添加到summarise()

dataset %>% 
    mutate(clust = as.factor(clust),
           lt_clust = as.factor(lt_clust),
           tick = 1) %>%
    group_by(clust, lt_clust, .drop = FALSE) %>%
    summarise(total = sum(tick)) %>%
    ungroup() %>%
    group_by(clust, ) %>%
        summarise(lt_clust, rate = total / sum(total))

# A tibble: 9 × 3
# Groups:   clust [3]
  clust lt_clust  rate
  <fct> <fct>    <dbl>
1 k1    k1        0.8 
2 k1    k2        0.2 
3 k1    k3        0   
4 k2    k1        0.2 
5 k2    k2        0.6 
6 k2    k3        0.2 
7 k3    k1        0.25
8 k3    k2        0   
9 k3    k3        0.75

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM