如何更快地对组内的观察结果进行排名？

Question

I have a really simple problem, but I'm probably not thinking vector-y enough to solve it efficiently.我有一个非常简单的问题，但我可能没有想到足够有效地解决它。 I tried two different approaches and they've been looping on two different computers for a long time now.我尝试了两种不同的方法，它们已经在两台不同的计算机上循环了很长时间。 I wish I could say the competition made it more exciting, but... bleh.我希望我可以说比赛让它更令人兴奋，但是...... bleh。

rank observations in group对组中的观察进行排名

I have long data (many rows per person, one row per person-observation) and I basically want a variable, that tells me how often the person has been observed already.我有很长的数据（每人多行，每人观察一行），我基本上想要一个变量，它告诉我已经观察到这个人的频率。

I have the first two columns and want the third one:我有前两列，想要第三列：

person  wave   obs
pers1   1999   1
pers1   2000   2
pers1   2003   3
pers2   1998   1
pers2   2001   2

Now I'm using two loop-approaches.现在我使用两种循环方法。 Both are excruciatingly slow (150k rows).两者都非常缓慢（150k 行）。 I'm sure I'm missing something, but my search queries didn't really help me yet (hard to phrase the problem).我确定我遗漏了一些东西，但我的搜索查询并没有真正帮助我（很难说出这个问题）。

Thanks for any pointers!感谢您的任何指点！

# ordered dataset by persnr and year of observation
person.obs <- person.obs[order(person.obs$PERSNR,person.obs$wave) , ]

person.obs$n.obs = 0

# first approach: loop through people and assign range
unp = unique(person.obs$PERSNR)
unplength = length(unp)
for(i in 1:unplength) {
   print(unp[i])
   person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs = 
1:length(person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs)
    i=i+1
   gc()
}

# second approach: loop through rows and reset counter at new person
pnr = 0
for(i in 1:length(person.obs[,2])) {
  if(pnr!=person.obs[i,]$PERSNR) { pnr = person.obs[i,]$PERSNR
  e = 0
  }
  e=e+1
  person.obs[i,]$n.obs = e
  i=i+1
  gc()
}

Answer 1

The answer from Marek in this question has proven very useful in the past. Marek 在这个问题上的回答在过去被证明非常有用。 I wrote it down and use it almost daily since it was fast and efficient.我把它写下来并几乎每天都使用它，因为它既快速又高效。 We'll use ave() and seq_along() .我们将使用ave()和seq_along() 。

foo <-data.frame(person=c(rep("pers1",3),rep("pers2",2)),year=c(1999,2000,2003,1998,2011))

foo <- transform(foo, obs = ave(rep(NA, nrow(foo)), person, FUN = seq_along))
foo

  person year obs
1  pers1 1999   1
2  pers1 2000   2
3  pers1 2003   3
4  pers2 1998   1
5  pers2 2011   2

Another option using plyr使用plyr另一种选择

library(plyr)
ddply(foo, "person", transform, obs2 = seq_along(person))

  person year obs obs2
1  pers1 1999   1    1
2  pers1 2000   2    2
3  pers1 2003   3    3
4  pers2 1998   1    1
5  pers2 2011   2    2

Answer 2

A few alternatives with the data.table and dplyr packages. data.table和dplyr封装的一些替代方案。

data.table: data.table：

library(data.table)
# setDT(foo) is needed to convert to a data.table

# option 1:
setDT(foo)[, rn := rowid(person)]   

# option 2:
setDT(foo)[, rn := 1:.N, by = person]

both give:两者都给出：

 > foo person year rn 1: pers1 1999 1 2: pers1 2000 2 3: pers1 2003 3 4: pers2 1998 1 5: pers2 2011 2

If you want a true rank, you should use the frank function:如果你想要一个真实的排名，你应该使用frank function：

setDT(foo)[, rn := frank(year, ties.method = 'dense'), by = person]

dplyr: dplyr：

library(dplyr)
# method 1
foo <- foo %>% group_by(person) %>% mutate(rn = row_number())
# method 2
foo <- foo %>% group_by(person) %>% mutate(rn = 1:n())

both giving a similar result:两者都给出了类似的结果：

 > foo Source: local data frame [5 x 3] Groups: person [2] person year rn (fctr) (dbl) (int) 1 pers1 1999 1 2 pers1 2000 2 3 pers1 2003 3 4 pers2 1998 1 5 pers2 2011 2

Answer 3

Would by do the trick?会做by伎俩？

> foo <-data.frame(person=c(rep("pers1",3),rep("pers2",2)),year=c(1999,2000,2003,1998,2011),obs=c(1,2,3,1,2))
> foo
  person year obs
1  pers1 1999   1
2  pers1 2000   2
3  pers1 2003   3
4  pers2 1998   1
5  pers2 2011   2
> by(foo, foo$person, nrow)
foo$person: pers1
[1] 3
------------------------------------------------------------ 
foo$person: pers2
[1] 2

Answer 4

Another option using aggregate and rank in base R:在基础 R 中使用aggregate和rank的另一个选项：

foo$obs <- unlist(aggregate(.~person, foo, rank)[,2])

 # person year obs
# 1  pers1 1999   1
# 2  pers1 2000   2
# 3  pers1 2003   3
# 4  pers2 1998   1
# 5  pers2 2011   2

如何更快地对组内的观察结果进行排名？

问题描述

rank observations in group对组中的观察进行排名

4 个解决方案

解决方案1
14 2011-05-28 16:35:19

解决方案2
5 已采纳 2016-02-17 08:48:05

解决方案3
2 2011-05-28 16:03:11

解决方案4
0 2017-05-11 15:14:24

如何更快地对组内的观察结果进行排名？

问题描述

rank observations in group对组中的观察进行排名

4 个解决方案

解决方案1 14 2011-05-28 16:35:19

解决方案2 5 已采纳 2016-02-17 08:48:05

解决方案3 2 2011-05-28 16:03:11

解决方案4 0 2017-05-11 15:14:24

解决方案1
14 2011-05-28 16:35:19

解决方案2
5 已采纳 2016-02-17 08:48:05

解决方案3
2 2011-05-28 16:03:11

解决方案4
0 2017-05-11 15:14:24