[英]grouping table by multiple factors and spreading it from long format to wide - the data.table way in R
As an example i will be using the mtcars
data available in R:例如,我将使用 R 中可用的mtcars
数据:
data(mtcars)
setDT(mtcars)
Lets day I want to group the data by three variables, namely: carb
, cyl
, and gear
.让我想通过三个变量对数据进行分组,即: carb
、 cyl
和gear
。 I have done this as follow.我这样做了如下。 However, i am sure there is a better way, as this is quite repetitive.但是,我相信有更好的方法,因为这是非常重复的。
newDTcars <- mtcars [, mtcars[, mtcars[, .N , by = carb], by = cyl], by= gear]
Secondly, I would like to have the data in a wide format, where there is a separate column for every gear
level.其次,我想要宽格式的数据,其中每个gear
都有一个单独的列。 For illustration purpose I have done this using tidyr
, however i would like to have this done the "data.table" way.出于说明目的,我使用tidyr
完成了此操作,但是我希望以“data.table”方式完成此操作。
newDTcars %>% tidyr::spread(gear, N)
The emphasis of this question is to keep to solution to the data.table world, as i would like too learn more about data.table
.这个问题的重点是继续解决 data.table 世界,因为我也想了解更多关于data.table
。
In data.table
, we can group by multiple columns and to reshape we can use dcast
.在data.table
,我们可以按多列进行分组,并且可以使用dcast
来重塑。
library(data.table)
dcast(mtcars[, .N, .(carb, cyl, gear)], carb+cyl~gear, value.var = "N")
# carb cyl 3 4 5
#1: 1 4 1 4 NA
#2: 1 6 2 NA NA
#3: 2 4 NA 4 2
#4: 2 8 4 NA NA
#5: 3 8 3 NA NA
#6: 4 6 NA 4 NA
#7: 4 8 5 NA 1
#8: 6 6 NA NA 1
#9: 8 8 NA NA 1
You may use fill
argument in dcast
to replace NA
s with 0 or any other number.您可以在dcast
使用fill
参数将NA
替换为 0 或任何其他数字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.