简体   繁体   English

在R中向量化for循环

[英]Vectorizing for-loop in R

Oh, man. 天啊。 I am so terrible at removing for-loops from my code because I find them so intuitive and I first learned C++. 我非常讨厌从代码中删除for循环,因为我发现它们如此直观,并且我首先学习了C ++。 Below, I am fetching IDs for a search (copd in this case) and using that ID to retrieve its full XML file and from that save its location into a vector. 在下面,我要获取ID进行搜索(在本例中为copd),并使用该ID检索其完整的XML文件,并从中将其位置保存到向量中。 I do not know how to speed this up, and it took about 5 minutes to run on 700 IDs, whereas most searches have 70,000+ IDs. 我不知道如何加快速度,使用700个ID花费了大约5分钟的时间,而大多数搜索都具有70,000多个ID。 Thank you for any and all guidance. 感谢您的指导。

library(rentrez)
library(XML)

# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count

# set max to count
id <- entrez_search(db = "pubmed", term = "copd", retmax = count)$ids

# empty vector that will soon contain locations
location <- character()

# get all location data 
for (i in 1:count)
{
  # get ID of each search
  test <- entrez_fetch(db = "pubmed", id = id[i], rettype = "XML")

  # convert to XML
  test_list <- XML::xmlToList(test)

  # retrieve location
  location <- c(location, test_list$PubmedArticle$MedlineCitation$Article$AuthorList$Author$AffiliationInfo$Affiliation)
}

This may give you a start - it seems to be possible to pull down multiple at once. 这可能会给您一个开始-似乎可以一次下拉多个。

library(rentrez)
library(xml2)

# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count

# set max to count
id_search <- entrez_search(db = "pubmed", term = "copd", retmax = count, use_history = T)

# get all
document <- entrez_fetch(db = "pubmed", rettype = "XML", web_history = id_search$web_history)

document_list <- as_list(read_xml(document))

Problem is that this is still time consuming because there are a large number of documents. 问题在于这仍然很耗时,因为有大量的文档。 Its also curious that it returns exactly 10,000 articles when I've tried this - there may be a limit to what you can return at once. 它也很好奇,当我尝试过此操作时,它只能返回10,000篇文章-一次返回的内容可能有限制。

You can then use something like the purrr package to start extracting the information you want. 然后,您可以使用诸如purrr包之类的东西开始提取所需的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM