[英]Using dplyr to get cumulative count by group
Thanks in advance. 提前致谢。 I have the following data: 我有以下数据:
df <- data.frame(person=c(1,1,1,1,2,2,2,2,3,3,3,3),
neighborhood=c("A","A","A","A","B","B","C","C","D","D","E","F"))
I would like to generate a new column that gives the cumulative count of neighborhoods that each person moves through as the panel progresses. 我想生成一个新列,它给出了每个人在面板进行过程中所经过的邻域的累积计数。 Like such: 像这样:
df2 <- data.frame(person=c(1,1,1,1,2,2,2,2,3,3,3,3),
neighborhood=c("A","A","A","A","B","B","C","C","D","D","E","F"),
moved=c(0,0,0,0,0,0,1,1,0,0,1,2)
)
Thanks again. 再次感谢。
We can use group by 'person', then create the 'moved' by match
ing the 'neighborhood' with its unique
values to get the index and subtract 1. 我们可以使用'person'分组,然后通过将'邻域'与其unique
值match
来创建'移动'以获得索引并减去1。
df %>%
group_by(person) %>%
mutate(moved = match(neighborhood, unique(neighborhood))-1)
# person neighborhood moved
# <dbl> <fctr> <dbl>
#1 1 A 0
#2 1 A 0
#3 1 A 0
#4 1 A 0
#5 2 B 0
#6 2 B 0
#7 2 C 1
#8 2 C 1
#9 3 D 0
#10 3 D 0
#11 3 E 1
#12 3 F 2
or use factor
with levels
specified as the unique
values in 'neighborhood', coerce to 'integer' and subtract 1. 或者使用具有指定为'邻域'中unique
值的levels
factor
,强制为'整数'并减去1。
df %>%
group_by(person) %>%
mutate(moved = as.integer(factor(neighborhood, levels = unique(neighborhood)))-1)
# person neighborhood moved
# <dbl> <fctr> <dbl>
#1 1 A 0
#2 1 A 0
#3 1 A 0
#4 1 A 0
#5 2 B 0
#6 2 B 0
#7 2 C 1
#8 2 C 1
#9 3 D 0
#10 3 D 0
#11 3 E 1
#12 3 F 2
This can also easily be achieved with rleid
or the frank
functions from the data.table
package: 使用data.table
包中的rleid
或frank
函数也可以轻松实现这data.table
:
library(data.table)
# with 'rleid'
setDT(df)[, moved := rleid(neighborhood)-1, by = person]
# with 'frank'
setDT(df)[, moved := frank(neighborhood, ties.method='dense')-1, by = person]
the result: 结果:
> df
person neighborhood moved
1: 1 A 0
2: 1 A 0
3: 1 A 0
4: 1 A 0
5: 2 B 0
6: 2 B 0
7: 2 C 1
8: 2 C 1
9: 3 D 0
10: 3 D 0
11: 3 E 1
12: 3 F 2
With dplyr
you could use the dense_rank
function: 使用dplyr
您可以使用dense_rank
函数:
library(dplyr)
df %>%
group_by(person) %>%
mutate(moved = dense_rank(neighborhood)-1)
This can be achieved using window functions of dplyr
, as well. 这也可以使用dplyr
窗口函数来实现。 Here is the code: 这是代码:
library(dplyr)
my.df <- tbl_df(df)
my.df %>%
# Per person
group_by(person) %>%
# sort by neighborhood
arrange(neighborhood) %>%
# if the neighborhood has changed compared to the row before
mutate(moved = (neighborhood != lag(neighborhood))) %>%
# turn NAs (first rows) into FALSE
mutate(moved = ifelse(is.na(moved), FALSE, moved)) %>%
# use cumulative sum of the logical column to get number of moves
mutate(no_moves = cumsum(moved))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.