[英]How do I count only previous value not using summarize in R?
這是我的數據集。
num col1
1 SENSOR_01
2 SENSOR_01
3 SENSOR_01
4 SENSOR_05
5 SENSOR_05
6 SENSOR_05
7 NA
8 SENSOR_01
9 SENSOR_01
10 SENSOR_05
11 SENSOR_05
structure(list(num = 1:11,col1 = structure(c(1L,1L,1L,2L,2L,2L,NA,1L,1L,2L,2L)),.Label = c(“ SENSOR_01”,“ SENSOR_05 “),類別=”因子“),計數= c(3L,3L,3L,3L,3L,3L,0L,2L,2L,2L,2L))),類別=“ data.frame”,row.names = c(NA,-11L)
我只想計算以前的重復行。 在1-3行中,傳感器3重復3次,因此計數=3。這是我的預期結果。
num col1 count
1 SENSOR_01 3
2 SENSOR_01 3
3 SENSOR_01 3
4 SENSOR_05 3
5 SENSOR_05 3
6 SENSOR_05 3
7 NA 1
8 SENSOR_01 2
9 SENSOR_01 2
10 SENSOR_05 2
11 SENSOR_05 2
使用dplyr,如何獲得此結果?
我們可以使用rleid
創建組,然后計算每個組中的行數。
library(dplyr)
df %>%
group_by(group = data.table::rleid(col1)) %>%
mutate(n = n()) %>%
ungroup() %>%
dplyr::select(-group)
# A tibble: 11 x 4
# num col1 count n
# <int> <fct> <int> <int>
# 1 1 SENSOR_01 3 3
# 2 2 SENSOR_01 3 3
# 3 3 SENSOR_01 3 3
# 4 4 SENSOR_05 3 3
# 5 5 SENSOR_05 3 3
# 6 6 SENSOR_05 3 3
# 7 7 NA 1 1
# 8 8 SENSOR_01 2 2
# 9 9 SENSOR_01 2 2
#10 10 SENSOR_05 2 2
#11 11 SENSOR_05 2 2
保留兩列以進行比較。
或使用data.table
library(data.table)
setDT(df)[, n := .N, by = rleid(col1)]
就像一個選項,我們可以使用的變量(為了rownames
傳統data.frame
)。 這個想法很簡單:
在tidyverse
:
dat %>%
mutate(tmp = 1:n()) %>%
group_by(col1) %>%
add_count(tmp = cumsum(c(0, diff(tmp)) > 1)) %>%
ungroup() %>%
select(-tmp)
# # A tibble: 11 x 3
# num col1 n
# <int> <fct> <int>
# 1 1 SENSOR_01 3
# 2 2 SENSOR_01 3
# 3 3 SENSOR_01 3
# 4 4 SENSOR_05 3
# 5 5 SENSOR_05 3
# 6 6 SENSOR_05 3
# 7 7 NA 1
# 8 8 SENSOR_01 2
# 9 9 SENSOR_01 2
# 10 10 SENSOR_05 2
# 11 11 SENSOR_05 2
數據:
dat <- structure(
list(
num = 1:11,
col1 = structure(
c(1L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 1L, 2L, 2L),
.Label = c("SENSOR_01", "SENSOR_05" ),
class = "factor")
),
class = "data.frame",
row.names = c(NA, -11L)
)
我們可以使用帶有rle
base R
來創建“計數”列
df$count <- with(rle(df$col1), rep(lengths, lengths))
df$count
#[1] 3 3 3 3 3 3 1 2 2 2 2
還是上面的dplyr
實現
library(dplyr)
df %>%
mutate(count = with(rle(col1), rep(lengths, lengths)))
或帶有tidyverse
的選項而不包含任何其他軟件包
library(dplyr)
df %>%
group_by(grp = replace_na(col1, "VALUE"),
grp = cumsum(grp != lag(grp, default = first(grp)))) %>%
mutate(count = n()) %>%
ungroup %>%
select(-grp)
# A tibble: 11 x 3
# num col1 count
# <int> <chr> <int>
# 1 1 SENSOR_01 3
# 2 2 SENSOR_01 3
# 3 3 SENSOR_01 3
# 4 4 SENSOR_05 3
# 5 5 SENSOR_05 3
# 6 6 SENSOR_05 3
# 7 7 <NA> 1
# 8 8 SENSOR_01 2
# 9 9 SENSOR_01 2
#10 10 SENSOR_05 2
#11 11 SENSOR_05 2
df <- structure(list(num = 1:11, col1 = c("SENSOR_01", "SENSOR_01",
"SENSOR_01", "SENSOR_05", "SENSOR_05", "SENSOR_05", NA, "SENSOR_01",
"SENSOR_01", "SENSOR_05", "SENSOR_05")),
class = "data.frame", row.names = c(NA,
-11L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.