[英]Identify clusters of identical values in column of 1s and 0s
This is possibly a silly question, but I am a beginner and I haven't been able to find an answer anywhere else. 这可能是一个愚蠢的问题,但是我是一个初学者,在其他任何地方都找不到答案。
Given the column in the example below, is there a way for R to automatically identify clusters of 1s and 0s, so that I can easily count how many there are in total (in this case, three clusters of 1s and three clusters of 0s)? 给定以下示例中的列,R是否有一种方法可以自动识别1和0的群集,因此我可以轻松地计算总数(在这种情况下,三个1和3的群集) ?
Thank you in advance. 先感谢您。
> my_column = matrix(c(1,1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0))
> my_column
[,1]
[1,] 1
[2,] 1
[3,] 1
[4,] 1
[5,] 1
[6,] 0
[7,] 0
[8,] 0
[9,] 0
[10,] 1
[11,] 1
[12,] 1
[13,] 0
[14,] 0
[15,] 0
[16,] 0
[17,] 0
[18,] 1
[19,] 1
[20,] 1
[21,] 1
[22,] 1
[23,] 1
[24,] 0
[25,] 0
[26,] 0
We can use rle
and table
: 我们可以使用rle
和table
:
table(rle(my_column[,1])$values)
Output: 输出:
0 1
3 3
You can try giving groups for each row. 您可以尝试为每行分配组。 An easy way is to find the point of change. 一种简单的方法是找到变更点。 To do this, simply calculate the difference between the entry i
and i + 1
and take the absolute value. 为此,只需计算条目i
和i + 1
之间的差并取绝对值即可。 After that, you only need to use the cumsum
function to create an id for each group: 之后,您只需要使用cumsum
函数为每个组创建一个id:
my_column = matrix(c(1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0))
new_column <- abs(c(0, my_column[-length(my_column)] - my_column[-1]))
groups <- cumsum(new_column)
my_mat <- cbind(original = my_column, new_column = new_column, group = groups)
> my_mat
new_column group
[1,] 1 0 0
[2,] 1 0 0
[3,] 1 0 0
[4,] 1 0 0
[5,] 1 0 0
[6,] 0 1 1
[7,] 0 0 1
[8,] 0 0 1
[9,] 0 0 1
[10,] 1 1 2
[11,] 1 0 2
[12,] 1 0 2
[13,] 0 1 3
[14,] 0 0 3
[15,] 0 0 3
[16,] 0 0 3
[17,] 0 0 3
[18,] 1 1 4
[19,] 1 0 4
[20,] 1 0 4
[21,] 1 0 4
[22,] 1 0 4
[23,] 1 0 4
[24,] 0 1 5
[25,] 0 0 5
[26,] 0 0 5
Now you have everything you need. 现在,您拥有了所需的一切。
To count the number of groups you can do: 要计算组数,可以执行以下操作:
library(dplyr)
my_df <- data.frame(original = my_column, new_column = new_column, group = groups)
my_df %>% group_by(original) %>% summarise(n_groups = n_distinct(group))
# A tibble: 2 x 2
original n_groups
<dbl> <int>
1 0 3
2 1 3
You can count the 0's and ones of any column like this 您可以像这样计算任何列的0和1
Count_0 <- sum(my_column[,1] == 0)
Count_1 <- sum(my_column[,1] == 1)
Or use apply
over the whole dataframe 或使用apply
整个数据框
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.