简体   繁体   中英

Different ways to get summaries with data.table R

temp <- data.table(fir=c("A", "B", "B", "C", "A", "D"), sec=c(1,1,1,1,2,2))

 fir sec
  A   1
  B   1
  B   1
  C   1
  A   2
  D   2

If I want to get a summary by the "sec" column, for example just counting the number of occurences. I can try...

method a)

 temp[,.N, by=sec]


  sec N
  1:   1 4
  2:   2 2

We get as many of rows as different levels we have at "sec".

method b)

 temp[,Num:=.N, by=sec]

Same summary but keeping all the columns and the same number of rows.

 fir sec Num
  A   1   4
  B   1   4
  B   1   4
  C   1   4
  A   2   2
  D   2   2

But...
How can get a result like method a) but specifying the name of the new column? I mean without needing to explicitly changing the names later.
I've tried with Num=.N without the := but it doesn't work.

How can get a result like method b) but without explicitly writing the name of the new column and without modifying the original datatable? (like ave()) I mean running something like this

 temp[,.N, by=sec]

but getting

 fir sec  N
  A   1   4
  B   1   4
  B   1   4
  C   1   4
  A   2   2
  D   2   2

We can use rep

temp[,.(Num = rep(.N, .N)), by=sec]

If we need to get the other variables, one option is on

temp[temp[, .(Num = .N), by=sec], on = .(sec)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM