[英]Using proportion as the aggregate function in Data.Table's dcast
When creating a pivot table using data.table, I am using the dcast function: 使用data.table创建数据透视表时,我使用的是dcast函数:
dcast(my_data, var1 ~ var2, length)
This gives a table with rows as var1 lables and column as var2 labels and value as count of cells common to particular row and column. 这给出了一个表,其中行作为var1标签,列作为var2标签,值作为特定行和列共有的单元格计数。
But instead of length I want to calculate the proportion and put it as the value, ie {count of cells common to particular row and column} divided by {count of all cells in the column ie a particular level of var2} 但是我要计算长度而不是长度,并将其作为值,即{特定行和列共有的单元格数}除以{该列中所有单元格的计数即特定水平的var2}
I have searched and couldn't able to implement it. 我已经搜索过并且无法实现它。 Any help would be appreciated. 任何帮助,将不胜感激。
There is a relatively simple solution, but it requires a second step after the dcast()
. 有一个相对简单的解决方案,但在dcast()
之后需要第二步。
First, this is the data I am working on: 首先,这是我正在处理的数据:
library(data.table)
set.seed(666)
my_data <- data.table(var1 = sample(letters[1:3], 10, TRUE),
var2 = sample(letters[4:6], 10, TRUE))
var1 var2
1: c f
2: a d
3: c d
4: a d
5: b d
6: c f
7: c d
8: b f
9: a e
10: a e
After the dcast 播后
my_data_dcast <- dcast(my_data, var1 ~ var2, length)
the data looks like this: 数据如下所示:
var1 d e f
1: a 2 2 0
2: b 1 0 1
3: c 2 0 2
You can then simply go through all columns and divide each element in a column by the sum of all values in a column. 然后,您可以简单地遍历所有列,然后将列中的每个元素除以列中所有值的总和。
Select the columns to transform: 选择要转换的列:
cols <- unique(my_data$var2)
Go through columns using lapply()
on the subset of columns specified in .SDcols
and override the values of all cols
: 在.SDcols
指定的.SDcols
集中使用lapply()
各列,并覆盖所有cols
的值:
my_data_dcast[, (cols) := (lapply(.SD, function(col) col / sum(col))),
.SDcols = cols]
The final result is this: 最终结果是这样的:
var1 d e f
1: a 0.4 1 0.0000000
2: b 0.2 0 0.3333333
3: c 0.4 0 0.6666667
We can use Reduce
with +
if we need a row wise proportion 如果需要按行比例,可以使用带有+
Reduce
dcast(my_data, var1~ var2, length)[, .SD/Reduce(`+`, .SD), var1]
# var1 A B C D
#1: a 0.3750000 0.0000000 0.3750000 0.25
#2: b 0.6000000 0.2000000 0.2000000 0.00
#3: c 0.2857143 0.1428571 0.5714286 0.00
If we need column wise 如果我们需要列明智
dcast(my_data, var1~ var2, length)[, .SD, var1][,
(2:5) := Map(`/`, .SD, colSums(.SD)), .SDcols = -1][]
# var1 A B C D
#1: a 0.375 0.0 0.375 1
#2: b 0.375 0.5 0.125 0
#3: c 0.250 0.5 0.500 0
This would be more compact with base R
使用base R
会更紧凑
prop.table(table(my_data), 1)
prop.table(table(my_data), 2)
set.seed(24)
my_data <- data.table(var1 = sample(letters[1:3], 20, replace = TRUE),
var2 = sample(LETTERS[1:4], 20, replace = TRUE))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.