This is possibly a silly question, but I am a beginner and I haven't been able to find an answer anywhere else.
Given the column in the example below, is there a way for R to automatically identify clusters of 1s and 0s, so that I can easily count how many there are in total (in this case, three clusters of 1s and three clusters of 0s)?
Thank you in advance.
> my_column = matrix(c(1,1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0))
> my_column
[,1]
[1,] 1
[2,] 1
[3,] 1
[4,] 1
[5,] 1
[6,] 0
[7,] 0
[8,] 0
[9,] 0
[10,] 1
[11,] 1
[12,] 1
[13,] 0
[14,] 0
[15,] 0
[16,] 0
[17,] 0
[18,] 1
[19,] 1
[20,] 1
[21,] 1
[22,] 1
[23,] 1
[24,] 0
[25,] 0
[26,] 0
We can use rle
and table
:
table(rle(my_column[,1])$values)
Output:
0 1
3 3
You can try giving groups for each row. An easy way is to find the point of change. To do this, simply calculate the difference between the entry i
and i + 1
and take the absolute value. After that, you only need to use the cumsum
function to create an id for each group:
my_column = matrix(c(1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0))
new_column <- abs(c(0, my_column[-length(my_column)] - my_column[-1]))
groups <- cumsum(new_column)
my_mat <- cbind(original = my_column, new_column = new_column, group = groups)
> my_mat
new_column group
[1,] 1 0 0
[2,] 1 0 0
[3,] 1 0 0
[4,] 1 0 0
[5,] 1 0 0
[6,] 0 1 1
[7,] 0 0 1
[8,] 0 0 1
[9,] 0 0 1
[10,] 1 1 2
[11,] 1 0 2
[12,] 1 0 2
[13,] 0 1 3
[14,] 0 0 3
[15,] 0 0 3
[16,] 0 0 3
[17,] 0 0 3
[18,] 1 1 4
[19,] 1 0 4
[20,] 1 0 4
[21,] 1 0 4
[22,] 1 0 4
[23,] 1 0 4
[24,] 0 1 5
[25,] 0 0 5
[26,] 0 0 5
Now you have everything you need.
To count the number of groups you can do:
library(dplyr)
my_df <- data.frame(original = my_column, new_column = new_column, group = groups)
my_df %>% group_by(original) %>% summarise(n_groups = n_distinct(group))
# A tibble: 2 x 2
original n_groups
<dbl> <int>
1 0 3
2 1 3
You can count the 0's and ones of any column like this
Count_0 <- sum(my_column[,1] == 0)
Count_1 <- sum(my_column[,1] == 1)
Or use apply
over the whole dataframe
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.