简体   繁体   中英

ddply: how to include a character vector in result

sorry, for the cryptic title i didn't find any better summary for my problem. So here's my problem: i have a dataframe and want to make diff() over groups which works fine:

 df <- data.frame (name = rep(c("a", "b", "c"), 4),
              index = rep(c("c1", "c2"), each=6),
              year = rep(c(2008:2010),4),
              value = rep(1:3, each=4))

head(df)

  name index year value

1    a    c1 2008     1
2    b    c1 2009     1
3    c    c1 2010     1

ddply(df, .(name, year), summarize,  value=diff(value))

However, I would like to include the index in my result wich i tried to do with:

ddply(df, .(name, year), summarize,  value=diff(value), index=index)

Yet this yields the error message:

length(rows) == 1 is not TRUE

Which is I guess because the index has more rows because it is not processed by diff . Is there a quick solution to my problem?

Thank you very much!

EDIT

I try to clarify my question what I want to add to the result:

Suppose the variable index above. This is a factor that ought to explain something. Yet, I cannot take diff() of it that would not make sense so I just want to pass this one without changing anything. I tried drop==FALSE wich did yield the same error messsage.

Sorr for all this confusion! Here's a very simple example:

name year  index  value
 a   2008    c1    10
 a   2009    c2    30
 a   2010    c1    40

after taking diff's acroos group 'a' this looks like:

name year index d.value 
 a   2009  c2     +20  #c2 stayed the same just the first row got intentionally dropped.
 a   2010  c1     +10

consider the unfortunate name index as something like an attribute: it can change during the years but would not make sense to take a diff()

I really really hope this gives you a clue what I want - if not I'll delete the question because I found an unelegant workaround ;) and sorry for all the inconvenience!

I'm not entirely sure what you want, it sounded like you want to get diffs, keeping the index variable and dropping the first row of each grouping. Does this get you what you want?

doSummary = function(df) {
  values = diff(df$value)
  indexes = df$index[2:length(df)]
  data.frame(d.value=values, index=indexes)
}
ddply(df, .(name, year), doSummary)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM