最有效的子集化数据帧的方法

Question

Can anyone suggest more efficient way of subsetting dataframe without using SQL/indexing/data.table options? 任何人都可以建议更有效的方法来分组数据帧而不使用SQL/indexing/data.table选项吗？

I looked for similar questions, and this one suggests indexing option. 我寻找类似的问题，这个建议索引选项。

Here are ways to subset with timings. 以下是定时子集的方法。

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time
system.time(x <- dat[dat$x > 500, ])
#   user  system elapsed 
#  0.092   0.000   0.090 
system.time(x <- dat[which(dat$x > 500), ])
#   user  system elapsed 
#  0.040   0.032   0.070 
system.time(x <- subset(dat, x > 500))
#   user  system elapsed 
#  0.108   0.004   0.109

EDIT: As Roland suggested I used microbenchmark . 编辑：正如罗兰建议我使用microbenchmark 。 It seems which performs the best. 似乎which表现最好。

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x > 500, ],
                       dat[which(dat$x > 500), ],
                       subset(dat, x > 500))
#plot
autoplot.microbenchmark(res)

在此输入图像描述

Answer 1

As Roland suggested I used microbenchmark. 正如罗兰建议我使用microbenchmark。 It seems which performs the best. 似乎which表现最好。

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x > 500, ],
                       dat[which(dat$x > 500), ],
                       subset(dat, x > 500))
#plot
autoplot.microbenchmark(res)

在此输入图像描述

最有效的子集化数据帧的方法

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-09-12 09:25:09

最有效的子集化数据帧的方法

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-09-12 09:25:09

解决方案1
1 已采纳 2013-09-12 09:25:09