[英]Using dplyr to get cumulative count by group
提前致謝。 我有以下數據:
df <- data.frame(person=c(1,1,1,1,2,2,2,2,3,3,3,3),
neighborhood=c("A","A","A","A","B","B","C","C","D","D","E","F"))
我想生成一個新列,它給出了每個人在面板進行過程中所經過的鄰域的累積計數。 像這樣:
df2 <- data.frame(person=c(1,1,1,1,2,2,2,2,3,3,3,3),
neighborhood=c("A","A","A","A","B","B","C","C","D","D","E","F"),
moved=c(0,0,0,0,0,0,1,1,0,0,1,2)
)
再次感謝。
我們可以使用'person'分組,然后通過將'鄰域'與其unique
值match
來創建'移動'以獲得索引並減去1。
df %>%
group_by(person) %>%
mutate(moved = match(neighborhood, unique(neighborhood))-1)
# person neighborhood moved
# <dbl> <fctr> <dbl>
#1 1 A 0
#2 1 A 0
#3 1 A 0
#4 1 A 0
#5 2 B 0
#6 2 B 0
#7 2 C 1
#8 2 C 1
#9 3 D 0
#10 3 D 0
#11 3 E 1
#12 3 F 2
或者使用具有指定為'鄰域'中unique
值的levels
factor
,強制為'整數'並減去1。
df %>%
group_by(person) %>%
mutate(moved = as.integer(factor(neighborhood, levels = unique(neighborhood)))-1)
# person neighborhood moved
# <dbl> <fctr> <dbl>
#1 1 A 0
#2 1 A 0
#3 1 A 0
#4 1 A 0
#5 2 B 0
#6 2 B 0
#7 2 C 1
#8 2 C 1
#9 3 D 0
#10 3 D 0
#11 3 E 1
#12 3 F 2
使用data.table
包中的rleid
或frank
函數也可以輕松實現這data.table
:
library(data.table)
# with 'rleid'
setDT(df)[, moved := rleid(neighborhood)-1, by = person]
# with 'frank'
setDT(df)[, moved := frank(neighborhood, ties.method='dense')-1, by = person]
結果:
> df
person neighborhood moved
1: 1 A 0
2: 1 A 0
3: 1 A 0
4: 1 A 0
5: 2 B 0
6: 2 B 0
7: 2 C 1
8: 2 C 1
9: 3 D 0
10: 3 D 0
11: 3 E 1
12: 3 F 2
使用dplyr
您可以使用dense_rank
函數:
library(dplyr)
df %>%
group_by(person) %>%
mutate(moved = dense_rank(neighborhood)-1)
這也可以使用dplyr
窗口函數來實現。 這是代碼:
library(dplyr)
my.df <- tbl_df(df)
my.df %>%
# Per person
group_by(person) %>%
# sort by neighborhood
arrange(neighborhood) %>%
# if the neighborhood has changed compared to the row before
mutate(moved = (neighborhood != lag(neighborhood))) %>%
# turn NAs (first rows) into FALSE
mutate(moved = ifelse(is.na(moved), FALSE, moved)) %>%
# use cumulative sum of the logical column to get number of moves
mutate(no_moves = cumsum(moved))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.