[英]Return number based on grouping variables
我正在使用一個鳥類數據集,其中每個人(ID)都有他們出生的領土(TERR),出生年份(YOB)以及他們出生年份(DOB)之后的天數。
通常有多個人具有相同的TERR,YEAR和DOB。 可能會有更多的人出生在相同的TERR和YEAR中,但這些人將有不同的DOB(在此數據集中,第一組個人將具有比第二組個人更低的DOB)。
我想插入一個新列'n',其中每年第一組個體返回'1',第二組個體返回'2',第三組個體返回'3'。 當它是下一年時,該數字將恢復為“1”。
例如
ID TERR YOB DOB N
1 A1 1982 148 1
2 A1 1982 148 1
3 A1 1982 148 1
4 A1 1982 185 2
5 A1 1982 185 2
6 A1 1985 137 1
7 A1 1985 137 1
8 BIAN 1989 132 1
9 BIAN 1989 132 1
10 BIAN 1989 132 1
11 BIAN 1992 155 1
12 BIAN 1992 155 1
13 BIAN 1992 155 1
14 BIAN 1992 254 2
15 BIAN 1992 254 2
16 BIAN 1992 254 2
17 BIAN 1994 164 1
18 BIAN 1994 164 1
19 GATE 1998 119 1
20 GATE 1998 119 1
21 GATE 1998 172 2
22 GATE 1998 172 2
23 GATE 1998 172 2
24 GATE 1999 153 1
25 GATE 1999 153 1
我對R很新,所以任何幫助都非常感謝。 我一直在嘗試使用if_else函數,但沒有使用它。
在通過'TERR','YOB'進行分組后,獲得'DOB'與'DOB'的unique
元素的match
library(dplyr)
out <- df1 %>%
group_by(TERR, YOB) %>%
mutate(N1 = match(DOB, unique(DOB)))
identical(out$N, out$N1)
#[1] TRUE
out
# A tibble: 25 x 6
# Groups: TERR, YOB [7]
# ID TERR YOB DOB N N1
# <int> <chr> <int> <int> <int> <int>
# 1 1 A1 1982 148 1 1
# 2 2 A1 1982 148 1 1
# 3 3 A1 1982 148 1 1
# 4 4 A1 1982 185 2 2
# 5 5 A1 1982 185 2 2
# 6 6 A1 1985 137 1 1
# 7 7 A1 1985 137 1 1
# 8 8 BIAN 1989 132 1 1
# 9 9 BIAN 1989 132 1 1
#10 10 BIAN 1989 132 1 1
# ... with 15 more rows
或者將'DOB'轉換為factor
並將其強制轉換為numeric
df1 %>%
group_by(TERR, YOB) %>%
mutate(N1 = as.integer(factor(DOB, levels = unique(DOB))))
在具有ave
base R
可以使用相同的方法
with(df1, ave(DOB, TERR, YOB, FUN = function(x) match(x, unique(x))))
df1 <- structure(list(ID = 1:25, TERR = c("A1", "A1", "A1", "A1", "A1",
"A1", "A1", "BIAN", "BIAN", "BIAN", "BIAN", "BIAN", "BIAN", "BIAN",
"BIAN", "BIAN", "BIAN", "BIAN", "GATE", "GATE", "GATE", "GATE",
"GATE", "GATE", "GATE"), YOB = c(1982L, 1982L, 1982L, 1982L,
1982L, 1985L, 1985L, 1989L, 1989L, 1989L, 1992L, 1992L, 1992L,
1992L, 1992L, 1992L, 1994L, 1994L, 1998L, 1998L, 1998L, 1998L,
1998L, 1999L, 1999L), DOB = c(148L, 148L, 148L, 185L, 185L, 137L,
137L, 132L, 132L, 132L, 155L, 155L, 155L, 254L, 254L, 254L, 164L,
164L, 119L, 119L, 172L, 172L, 172L, 153L, 153L), N = c(1L, 1L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L)), .Names = c("ID", "TERR", "YOB",
"DOB", "N"), class = "data.frame", row.names = c(NA, -25L))
這是一個data.table
的解決方案:
library("data.table")
DT <- fread(
"ID TERR YOB DOB N
1 A1 1982 148 1
2 A1 1982 148 1
3 A1 1982 148 1
4 A1 1982 185 2
5 A1 1982 185 2
6 A1 1985 137 1
7 A1 1985 137 1
8 BIAN 1989 132 1
9 BIAN 1989 132 1
10 BIAN 1989 132 1
11 BIAN 1992 155 1
12 BIAN 1992 155 1
13 BIAN 1992 155 1
14 BIAN 1992 254 2
15 BIAN 1992 254 2
16 BIAN 1992 254 2
17 BIAN 1994 164 1
18 BIAN 1994 164 1
19 GATE 1998 119 1
20 GATE 1998 119 1
21 GATE 1998 172 2
22 GATE 1998 172 2
23 GATE 1998 172 2
24 GATE 1999 153 1
25 GATE 1999 153 1")
DT[, N2:=rleidv(DOB), .(TERR, YOB)][]
# > DT[, N2:=rleidv(DOB), .(TERR, YOB)][]
# ID TERR YOB DOB N N2
# 1: 1 A1 1982 148 1 1
# 2: 2 A1 1982 148 1 1
# 3: 3 A1 1982 148 1 1
# 4: 4 A1 1982 185 2 2
# 5: 5 A1 1982 185 2 2
# 6: 6 A1 1985 137 1 1
# 7: 7 A1 1985 137 1 1
# 8: 8 BIAN 1989 132 1 1
# 9: 9 BIAN 1989 132 1 1
# 10: 10 BIAN 1989 132 1 1
# 11: 11 BIAN 1992 155 1 1
# 12: 12 BIAN 1992 155 1 1
# 13: 13 BIAN 1992 155 1 1
# 14: 14 BIAN 1992 254 2 2
# 15: 15 BIAN 1992 254 2 2
# 16: 16 BIAN 1992 254 2 2
# 17: 17 BIAN 1994 164 1 1
# 18: 18 BIAN 1994 164 1 1
# 19: 19 GATE 1998 119 1 1
# 20: 20 GATE 1998 119 1 1
# 21: 21 GATE 1998 172 2 2
# 22: 22 GATE 1998 172 2 2
# 23: 23 GATE 1998 172 2 2
# 24: 24 GATE 1999 153 1 1
# 25: 25 GATE 1999 153 1 1
# ID TERR YOB DOB N N2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.