简体   繁体   中英

Calculate percentage summaries in data.table

If this is my dataset:

library(data.table)    
dt <- data.table(
  record=c(1:20),
  area=rep(LETTERS[1:4], c(4, 6, 3, 7)), 
  score=c(1,1:3,2:3,1,1,1,2,2,1,2,1,1,1,1,1:3),
  cluster=c("X", "Y", "Z")[c(1,1:3,3,2,1,1:3,1,1:3,3,3,3,1:3)]
)

What is the best way using data.table to calculate percentage summaries like this:

prop.table(table(dt$area, dt$score), 1)*100

However, I would also want more flexibility in the inputs of this summary. For example, including only records that belong to cluster 'X' or clusters 'Y' and 'Z')

dt[,.N,by=list(area,score)][,perc:=100*N/sum(N),by=area][,.SD]

和dcast.data.table(如果需要)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM