在Data.Table的dcast中使用比例作为聚合函数

Question

When creating a pivot table using data.table, I am using the dcast function: 使用data.table创建数据透视表时，我使用的是dcast函数：

dcast(my_data, var1 ~ var2, length)

This gives a table with rows as var1 lables and column as var2 labels and value as count of cells common to particular row and column. 这给出了一个表，其中行作为var1标签，列作为var2标签，值作为特定行和列共有的单元格计数。

But instead of length I want to calculate the proportion and put it as the value, ie {count of cells common to particular row and column} divided by {count of all cells in the column ie a particular level of var2} 但是我要计算长度而不是长度，并将其作为值，即{特定行和列共有的单元格数}除以{该列中所有单元格的计数即特定水平的var2}

I have searched and couldn't able to implement it. 我已经搜索过并且无法实现它。 Any help would be appreciated. 任何帮助，将不胜感激。

Answer 1

There is a relatively simple solution, but it requires a second step after the dcast() . 有一个相对简单的解决方案，但在dcast()之后需要第二步。

First, this is the data I am working on: 首先，这是我正在处理的数据：

library(data.table)

set.seed(666)
my_data <- data.table(var1 = sample(letters[1:3], 10, TRUE),
                      var2 = sample(letters[4:6], 10, TRUE))

    var1 var2
 1:    c    f
 2:    a    d
 3:    c    d
 4:    a    d
 5:    b    d
 6:    c    f
 7:    c    d
 8:    b    f
 9:    a    e
10:    a    e

After the dcast 播后

my_data_dcast <- dcast(my_data, var1 ~ var2, length)

the data looks like this: 数据如下所示：

   var1 d e f
1:    a 2 2 0
2:    b 1 0 1
3:    c 2 0 2

You can then simply go through all columns and divide each element in a column by the sum of all values in a column. 然后，您可以简单地遍历所有列，然后将列中的每个元素除以列中所有值的总和。

Select the columns to transform: 选择要转换的列：

cols <- unique(my_data$var2)

Go through columns using lapply() on the subset of columns specified in .SDcols and override the values of all cols : 在.SDcols指定的.SDcols集中使用lapply()各列，并覆盖所有cols的值：

my_data_dcast[, (cols) := (lapply(.SD, function(col) col / sum(col))),
              .SDcols = cols]

The final result is this: 最终结果是这样的：

   var1   d e         f
1:    a 0.4 1 0.0000000
2:    b 0.2 0 0.3333333
3:    c 0.4 0 0.6666667

Answer 2

We can use Reduce with + if we need a row wise proportion 如果需要按行比例，可以使用带有+ Reduce

dcast(my_data, var1~ var2, length)[, .SD/Reduce(`+`, .SD), var1]
#   var1         A         B         C    D
#1:    a 0.3750000 0.0000000 0.3750000 0.25
#2:    b 0.6000000 0.2000000 0.2000000 0.00
#3:    c 0.2857143 0.1428571 0.5714286 0.00

If we need column wise 如果我们需要列明智

dcast(my_data, var1~ var2, length)[, .SD, var1][, 
        (2:5) := Map(`/`, .SD, colSums(.SD)), .SDcols = -1][]
#   var1     A   B     C D
#1:    a 0.375 0.0 0.375 1
#2:    b 0.375 0.5 0.125 0
#3:    c 0.250 0.5 0.500 0

This would be more compact with base R 使用base R会更紧凑

prop.table(table(my_data), 1)
prop.table(table(my_data), 2)

data 数据

set.seed(24)
my_data <- data.table(var1 = sample(letters[1:3], 20, replace = TRUE),
           var2 = sample(LETTERS[1:4], 20, replace = TRUE))

在Data.Table的dcast中使用比例作为聚合函数

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-04-05 08:05:13

解决方案2
2 2018-04-05 08:06:58

data 数据

在Data.Table的dcast中使用比例作为聚合函数

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-04-05 08:05:13

解决方案2 2 2018-04-05 08:06:58

data 数据

解决方案1
2 已采纳 2018-04-05 08:05:13

解决方案2
2 2018-04-05 08:06:58