[英]Is there a way to replace this loop in R with something that runs faster?
To make it short, I am trying to speed things up. 简而言之,我正在努力加快速度。 This is my slow code:
这是我的慢代码:
library(dplyr)
tmp <- unique(kat$pnr) # Sort out the unique entries (ends up to about 572000)
sex = c()
for(i in tmp){ # For each unique pnr, look up the sex and append it to the new dataset
temptable <- filter(kat, pnr == i)
sex[i] <- temptable$sex
}
Currently the loop will take me hours as I have 572000 rows to loop through in the tmp-dataset and it seems like the system processes about 50 rows a second when I do some shorter test runs. 目前,该循环将花费我几个小时,因为我在tmp-dataset中有572000行要循环通过,当我进行一些较短的测试运行时,似乎系统每秒处理约50行。 So is there a way to replace this loop with something that runs faster?
那么,有没有一种方法可以用运行速度更快的东西来代替这个循环呢?
In the kat-dataset I have about 40 columns and 905000 rows of which the pnr is the unique identifier, however, one pnr can occur one or two times in kat. 在kat数据集中,我大约有40列和905000行,其中pnr是唯一标识符,但是,一个pnr可以在kat中出现一到两次。 I want to do gender statistics so I basically want to sort out the unique pnrs and the sex of each pnr.
我想进行性别统计,因此我基本上想整理一下唯一的pnrs和每个pnr的性别。
The ==
from unique elements with filter
would be slower and that too on a loop. 带有
filter
唯一元素的==
会更慢,并且在循环中也是如此。 Instead, for this case, a groupby
operation may be more appropriate if we want to find some descriptive statistics on the 'sex' column for each unique element of 'pnr' 相反,对于这种情况,如果我们想在“性别”列上为“ pnr”的每个唯一元素找到一些描述性统计信息,
groupby
操作可能更合适
library(dplyr)
kat %>%
group_by(pnr) %>%
summarise(val = fn(sex))
It can be made further faster with data.table
使用
data.table
可以使其变得更快
library(data.table)
setDT(kat)[, .(val = fn(sex)), by = .(pnr)]
NOTE: not clear about the function to apply on the 'sex' column 注意:不清楚要在“性别”列上应用的功能
If the intention is to create a list
of sex
, then 如果打算创建
sex
list
,则
lst1 <- split(kat$sex, kat$pnr)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.