简体   繁体   English

有没有一种方法可以用运行速度更快的方法替换R中的此循环?

[英]Is there a way to replace this loop in R with something that runs faster?

To make it short, I am trying to speed things up. 简而言之,我正在努力加快速度。 This is my slow code: 这是我的慢代码:

library(dplyr)
tmp <- unique(kat$pnr) # Sort out the unique entries (ends up to about 572000)
sex = c()
for(i in tmp){         # For each unique pnr, look up the sex and append it to the new dataset
  temptable <- filter(kat, pnr == i)
  sex[i] <- temptable$sex
}

Currently the loop will take me hours as I have 572000 rows to loop through in the tmp-dataset and it seems like the system processes about 50 rows a second when I do some shorter test runs. 目前,该循环将花费我几个小时,因为我在tmp-dataset中有572000行要循环通过,当我进行一些较短的测试运行时,似乎系统每秒处理约50行。 So is there a way to replace this loop with something that runs faster? 那么,有没有一种方法可以用运行速度更快的东西来代替这个循环呢?

In the kat-dataset I have about 40 columns and 905000 rows of which the pnr is the unique identifier, however, one pnr can occur one or two times in kat. 在kat数据集中,我大约有40列和905000行,其中pnr是唯一标识符,但是,一个pnr可以在kat中出现一到两次。 I want to do gender statistics so I basically want to sort out the unique pnrs and the sex of each pnr. 我想进行性别统计,因此我基本上想整理一下唯一的pnrs和每个pnr的性别。

The == from unique elements with filter would be slower and that too on a loop. 带有filter唯一元素的==会更慢,并且在循环中也是如此。 Instead, for this case, a groupby operation may be more appropriate if we want to find some descriptive statistics on the 'sex' column for each unique element of 'pnr' 相反,对于这种情况,如果我们想在“性别”列上为“ pnr”的每个唯一元素找到一些描述性统计信息, groupby操作可能更合适

library(dplyr)
kat %>%
    group_by(pnr) %>%
    summarise(val = fn(sex))

It can be made further faster with data.table 使用data.table可以使其变得更快

library(data.table)
setDT(kat)[, .(val = fn(sex)), by = .(pnr)]

NOTE: not clear about the function to apply on the 'sex' column 注意:不清楚要在“性别”列上应用的功能


If the intention is to create a list of sex , then 如果打算创建sex list ,则

lst1 <- split(kat$sex, kat$pnr)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM