简体   繁体   中英

Doubts about ddply function in R

I'm trying to do an equivalent group by summary in R through the plyr function named ddply . I have a data frame which have three columns (say id , period and event ). Then, I'd like to count the times each id appears in the data frame ( count(*)... group by id with SQL ) and get the last element of each id corresponding to the column event .

Here an example of what I have and what I'm trying to obtain:

  id period event #original data frame
  1      1     1
  2      1     0
  2      2     1
  3      1     1
  4      1     1
  4      1     0

  id  t  x #what I want to obtain
  1   1  1
  2   2  1
  3   1  1
  4   2  0

This is the simple code I've been using for that:

 teachers.pp<-read.table("http://www.ats.ucla.edu/stat/examples/alda/teachers_pp.csv", sep=",", header=T) # whole data frame
 datos=ddply(teachers.pp,.(id),function(x) c(t=length(x$id), x=x[length(x$id),3])) #This is working fine.

Now, I've been reading The Split-Apply-Combine Strategy for Data Analysis and it is given an example where they employed an equivalent syntax to the one I put below:

  datos2=ddply(teachers.pp,.(id), summarise, t=length(id), x=teachers.pp[length(id),3]) #using summarise but the result is not what I want. 

This is the data frame I get using datos2

  id  t  x
  1   1  1
  2   2  0
  3   1  1
  4   1  1

So, my question is: why is this result different from the one I get using the first piece of code, I mean datos1 ? What am I doing wrong?

It is not clear for me when I have to use summarise or transform . Could you tell me the correct syntax for the ddply function?

When you use summarise , stop referencing the original data frame. Instead, just write expressions in terms of the column names.

You tried this:

ddply(teachers.pp,.(id), summarise, t=length(id), x=teachers.pp[length(id),3])

when what you probably wanted was something more like this:

ddply(teachers.pp,.(id), summarise, t=length(id), x=tail(event,1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM