标识1和0列中具有相同值的聚类

Question

This is possibly a silly question, but I am a beginner and I haven't been able to find an answer anywhere else. 这可能是一个愚蠢的问题，但是我是一个初学者，在其他任何地方都找不到答案。

Given the column in the example below, is there a way for R to automatically identify clusters of 1s and 0s, so that I can easily count how many there are in total (in this case, three clusters of 1s and three clusters of 0s)? 给定以下示例中的列，R是否有一种方法可以自动识别1和0的群集，因此我可以轻松地计算总数（在这种情况下，三个1和3的群集）？

Thank you in advance. 先感谢您。

> my_column = matrix(c(1,1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0))
> my_column
      [,1]
 [1,]    1
 [2,]    1
 [3,]    1
 [4,]    1
 [5,]    1
 [6,]    0
 [7,]    0
 [8,]    0
 [9,]    0
[10,]    1
[11,]    1
[12,]    1
[13,]    0
[14,]    0
[15,]    0
[16,]    0
[17,]    0
[18,]    1
[19,]    1
[20,]    1
[21,]    1
[22,]    1
[23,]    1
[24,]    0
[25,]    0
[26,]    0

Answer 1

We can use rle and table : 我们可以使用rle和table ：

table(rle(my_column[,1])$values)

Output: 输出：

0 1 
3 3

Answer 2

You can try giving groups for each row. 您可以尝试为每行分配组。 An easy way is to find the point of change. 一种简单的方法是找到变更点。 To do this, simply calculate the difference between the entry i and i + 1 and take the absolute value. 为此，只需计算条目i和i + 1之间的差并取绝对值即可。 After that, you only need to use the cumsum function to create an id for each group: 之后，您只需要使用cumsum函数为每个组创建一个id：

my_column = matrix(c(1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0))
new_column <- abs(c(0, my_column[-length(my_column)] - my_column[-1]))
groups <- cumsum(new_column)

my_mat <- cbind(original = my_column, new_column = new_column, group = groups)
> my_mat
        new_column  group
 [1,] 1          0      0
 [2,] 1          0      0
 [3,] 1          0      0
 [4,] 1          0      0
 [5,] 1          0      0
 [6,] 0          1      1
 [7,] 0          0      1
 [8,] 0          0      1
 [9,] 0          0      1
[10,] 1          1      2
[11,] 1          0      2
[12,] 1          0      2
[13,] 0          1      3
[14,] 0          0      3
[15,] 0          0      3
[16,] 0          0      3
[17,] 0          0      3
[18,] 1          1      4
[19,] 1          0      4
[20,] 1          0      4
[21,] 1          0      4
[22,] 1          0      4
[23,] 1          0      4
[24,] 0          1      5
[25,] 0          0      5
[26,] 0          0      5

Now you have everything you need. 现在，您拥有了所需的一切。

EDIT: 编辑：

To count the number of groups you can do: 要计算组数，可以执行以下操作：

library(dplyr)
my_df <- data.frame(original = my_column, new_column = new_column, group = groups)

my_df %>% group_by(original) %>% summarise(n_groups = n_distinct(group))

# A tibble: 2 x 2
  original n_groups
     <dbl>    <int>
1        0        3
2        1        3

Answer 3

You can count the 0's and ones of any column like this 您可以像这样计算任何列的0和1

Count_0 <- sum(my_column[,1] == 0)
Count_1 <- sum(my_column[,1] == 1)

Or use apply over the whole dataframe 或使用apply整个数据框

标识1和0列中具有相同值的聚类

问题描述

3 个解决方案

解决方案1
4 2019-02-07 18:08:34

解决方案2
0 已采纳 2019-02-07 17:58:20

EDIT: 编辑：

解决方案3
-2 2019-02-07 18:03:47

标识1和0列中具有相同值的聚类

问题描述

3 个解决方案

解决方案1 4 2019-02-07 18:08:34

解决方案2 0 已采纳 2019-02-07 17:58:20

EDIT: 编辑：

解决方案3 -2 2019-02-07 18:03:47

解决方案1
4 2019-02-07 18:08:34

解决方案2
0 已采纳 2019-02-07 17:58:20

解决方案3
-2 2019-02-07 18:03:47