[英]conditional count in a dataframe
I have a dataframe (df) with the following structure 我有一个具有以下结构的数据框(df)
ID person_id person_type proof_id
A1 P1 applicant A1321
A1 P1 applicant A423412
A1 P1 applicant W352352
A1 P2 co_applicant D43252
A1 P2 co_applicant G43222
A2 P5 applicant K5647
A2 P5 applicant Pu7e5
A2 P6 co_applicant L032u4
A2 P7 co_applicant Q3344
I am trying to add another column to distinguish between the two person_co_applicant 我正在尝试添加另一列以区分两个person_co_applicant
ID person_id person_type proof_id final
A1 P1 applicant A1321 applicant1
A1 P1 applicant A423412 applicant1
A1 P1 applicant W352352 applicant1
A1 P2 co_applicant D43252 co_applicant1
A1 P2 co_applicant G43222 co_applicant1
A2 P5 applicant K5647 applicant1
A2 P5 applicant Pu7e5 applicant1
A2 P6 co_applicant L032u4 co_applicant1
A2 P7 co_applicant Q3344 co_applicant2
I tired the following but this increments for every row: 我厌倦了以下内容,但是每一行都会增加:
df <- df %>% group_by(ID, person_type, person_id ) %>%
mutate(final = paste(person_type, 1:n()))
You are certainly looking for grouping only by ID
and person_type
: 您肯定只在按
ID
和person_type
进行分组:
library(data.table)
setDT(df)[, final:=paste0(person_type,1:length(unique(person_id))),.(ID, person_type)]
With dplyr
you can use n_distinct
: 使用
dplyr
可以使用n_distinct
:
df %>%
group_by(ID, person_type) %>%
mutate(final=paste0(person_type, 1:n_distinct(person_id)))
You could use data.table
with ?rleid
: 您可以将
data.table
与?rleid
一起使用:
library(data.table)
setDT(df)[,final := paste0(person_type, rleid(person_id)),
by = c("ID", "person_type")]
> df
ID person_id person_type proof_id final
1: A1 P1 applicant A1321 applicant1
2: A1 P1 applicant A423412 applicant1
3: A1 P1 applicant W352352 applicant1
4: A1 P2 co_applicant D43252 co_applicant1
5: A1 P2 co_applicant G43222 co_applicant1
6: A2 P5 applicant K5647 applicant1
7: A2 P5 applicant Pu7e5 applicant1
8: A2 P6 co_applicant L032u4 co_applicant1
9: A2 P7 co_applicant Q3344 co_applicant2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.