简体   繁体   English

在 r 中使用 mutate 添加观察计数/计数

[英]Adding a tally/count of observations using mutate in r

I wanted to know if there was a more efficient way to add a tally to a dataset in R.我想知道是否有更有效的方法可以向 R 中的数据集添加计数。

Using the mpg dataset, this is how I do it using the mpg dataset as an example.使用 mpg 数据集,这就是我使用 mpg 数据集作为示例的方法。

mpg %>% 
  group_by(manufacturer) %>% 
  count() %>% 
  right_join(
    mpg
  )

So essentially, I want a count of the number of unique observations in the manufacturer column.所以本质上,我想要计算制造商列中唯一观察的数量。 It works fine as this is quite a small dataset, but I'm working with datasets with over 100k observations and wanted to find a better way to do it than to join in this way.它工作正常,因为这是一个相当小的数据集,但我正在使用具有超过 100k 观察值的数据集,并且希望找到一种比以这种方式加入更好的方法来做到这一点。

To give context, the number of unique observations are used as denominators for subsequent analyses.为了给出上下文,独特观察的数量被用作后续分析的分母。

If you want to go fast, you can try data.table :如果你想 go 快,你可以试试data.table

library(data.table) 
res <- data.table(mpg)[,':='(cnt = .N), by = manufacturer]
res
     manufacturer  model displ year cyl      trans drv cty hwy fl   class cnt
  1:         audi     a4   1.8 1999   4   auto(l5)   f  18  29  p compact  18
  2:         audi     a4   1.8 1999   4 manual(m5)   f  21  29  p compact  18
  3:         audi     a4   2.0 2008   4 manual(m6)   f  20  31  p compact  18
  4:         audi     a4   2.0 2008   4   auto(av)   f  21  30  p compact  18
  5:         audi     a4   2.8 1999   6   auto(l5)   f  16  26  p compact  18
 ---                                                                         
230:   volkswagen passat   2.0 2008   4   auto(s6)   f  19  28  p midsize  27
231:   volkswagen passat   2.0 2008   4 manual(m6)   f  21  29  p midsize  27
232:   volkswagen passat   2.8 1999   6   auto(l5)   f  16  26  p midsize  27
233:   volkswagen passat   2.8 1999   6 manual(m5)   f  18  26  p midsize  27
234:   volkswagen passat   3.6 2008   6   auto(s6)   f  17  26  p midsize  27

Benchmark (using @phiver nice solution):基准测试(使用@phiver 不错的解决方案):

library(dplyr)
library(microbenchmark)

microbenchmark(dplyr      =  mpg %>% group_by(manufacturer) %>% add_tally() ,
               data.table =  data.table(mpg)[,':='(cnt = .N), by = manufacturer])

Unit: milliseconds
       expr      min       lq     mean   median       uq       max neval
      dplyr 8.201807 8.557434 9.599122 9.018660 9.922339 17.425479   100
 data.table 1.245440 1.370666 1.615039 1.470719 1.691733  6.391889   100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM