在 r 中使用 mutate 添加观察计数/计数

Question

I wanted to know if there was a more efficient way to add a tally to a dataset in R.我想知道是否有更有效的方法可以向 R 中的数据集添加计数。

Using the mpg dataset, this is how I do it using the mpg dataset as an example.使用 mpg 数据集，这就是我使用 mpg 数据集作为示例的方法。

mpg %>% 
  group_by(manufacturer) %>% 
  count() %>% 
  right_join(
    mpg
  )

So essentially, I want a count of the number of unique observations in the manufacturer column.所以本质上，我想要计算制造商列中唯一观察的数量。 It works fine as this is quite a small dataset, but I'm working with datasets with over 100k observations and wanted to find a better way to do it than to join in this way.它工作正常，因为这是一个相当小的数据集，但我正在使用具有超过 100k 观察值的数据集，并且希望找到一种比以这种方式加入更好的方法来做到这一点。

To give context, the number of unique observations are used as denominators for subsequent analyses.为了给出上下文，独特观察的数量被用作后续分析的分母。

Answer 1

If you want to go fast, you can try data.table :如果你想 go 快，你可以试试data.table ：

library(data.table) 
res <- data.table(mpg)[,':='(cnt = .N), by = manufacturer]
res
     manufacturer  model displ year cyl      trans drv cty hwy fl   class cnt
  1:         audi     a4   1.8 1999   4   auto(l5)   f  18  29  p compact  18
  2:         audi     a4   1.8 1999   4 manual(m5)   f  21  29  p compact  18
  3:         audi     a4   2.0 2008   4 manual(m6)   f  20  31  p compact  18
  4:         audi     a4   2.0 2008   4   auto(av)   f  21  30  p compact  18
  5:         audi     a4   2.8 1999   6   auto(l5)   f  16  26  p compact  18
 ---                                                                         
230:   volkswagen passat   2.0 2008   4   auto(s6)   f  19  28  p midsize  27
231:   volkswagen passat   2.0 2008   4 manual(m6)   f  21  29  p midsize  27
232:   volkswagen passat   2.8 1999   6   auto(l5)   f  16  26  p midsize  27
233:   volkswagen passat   2.8 1999   6 manual(m5)   f  18  26  p midsize  27
234:   volkswagen passat   3.6 2008   6   auto(s6)   f  17  26  p midsize  27

Benchmark (using @phiver nice solution):基准测试（使用@phiver 不错的解决方案）：

library(dplyr)
library(microbenchmark)

microbenchmark(dplyr      =  mpg %>% group_by(manufacturer) %>% add_tally() ,
               data.table =  data.table(mpg)[,':='(cnt = .N), by = manufacturer])

Unit: milliseconds
       expr      min       lq     mean   median       uq       max neval
      dplyr 8.201807 8.557434 9.599122 9.018660 9.922339 17.425479   100
 data.table 1.245440 1.370666 1.615039 1.470719 1.691733  6.391889   100

在 r 中使用 mutate 添加观察计数/计数

问题描述

1 个解决方案

解决方案1
3 2022-08-16 12:37:14

在 r 中使用 mutate 添加观察计数/计数

问题描述

1 个解决方案

解决方案1 3 2022-08-16 12:37:14

解决方案1
3 2022-08-16 12:37:14