有没有一种方法可以用运行速度更快的方法替换R中的此循环？

Question

To make it short, I am trying to speed things up. 简而言之，我正在努力加快速度。 This is my slow code: 这是我的慢代码：

library(dplyr)
tmp <- unique(kat$pnr) # Sort out the unique entries (ends up to about 572000)
sex = c()
for(i in tmp){         # For each unique pnr, look up the sex and append it to the new dataset
  temptable <- filter(kat, pnr == i)
  sex[i] <- temptable$sex
}

Currently the loop will take me hours as I have 572000 rows to loop through in the tmp-dataset and it seems like the system processes about 50 rows a second when I do some shorter test runs. 目前，该循环将花费我几个小时，因为我在tmp-dataset中有572000行要循环通过，当我进行一些较短的测试运行时，似乎系统每秒处理约50行。 So is there a way to replace this loop with something that runs faster? 那么，有没有一种方法可以用运行速度更快的东西来代替这个循环呢？

In the kat-dataset I have about 40 columns and 905000 rows of which the pnr is the unique identifier, however, one pnr can occur one or two times in kat. 在kat数据集中，我大约有40列和905000行，其中pnr是唯一标识符，但是，一个pnr可以在kat中出现一到两次。 I want to do gender statistics so I basically want to sort out the unique pnrs and the sex of each pnr. 我想进行性别统计，因此我基本上想整理一下唯一的pnrs和每个pnr的性别。

Answer 1

The == from unique elements with filter would be slower and that too on a loop. 带有filter唯一元素的==会更慢，并且在循环中也是如此。 Instead, for this case, a groupby operation may be more appropriate if we want to find some descriptive statistics on the 'sex' column for each unique element of 'pnr' 相反，对于这种情况，如果我们想在“性别”列上为“ pnr”的每个唯一元素找到一些描述性统计信息， groupby操作可能更合适

library(dplyr)
kat %>%
    group_by(pnr) %>%
    summarise(val = fn(sex))

It can be made further faster with data.table 使用data.table可以使其变得更快

library(data.table)
setDT(kat)[, .(val = fn(sex)), by = .(pnr)]

NOTE: not clear about the function to apply on the 'sex' column 注意：不清楚要在“性别”列上应用的功能

If the intention is to create a list of sex , then 如果打算创建sex list ，则

lst1 <- split(kat$sex, kat$pnr)

有没有一种方法可以用运行速度更快的方法替换R中的此循环？

问题描述

1 个解决方案

解决方案1
2 2019-08-08 14:14:10

有没有一种方法可以用运行速度更快的方法替换R中的此循环？

问题描述

1 个解决方案

解决方案1 2 2019-08-08 14:14:10

解决方案1
2 2019-08-08 14:14:10