简体   繁体   English

有人可以解释 R 的底层方法吗?

[英]Can someone explain the underlying methods of R?

I'm coming from a background in Python and C++, and R seems to use magic that I don't understand.我来自 Python 和 C++ 的背景,而 R 似乎使用了我不理解的魔法。 I was hoping someone would be able to give me some insight into how it works.我希望有人能够让我深入了解它是如何工作的。

I was tasked with applying an algorithm to each row in a tibble of about 3,400,000 data points, and coming from C++, I thought to iterate over the table and calculate it manually and entering it into the tibble as such:我的任务是对大约 3,400,000 个数据点的每行应用算法,并且来自 C++,我想遍历表并手动计算它并将其输入到 tibble 中:

add_elev <- function(all, elev){
  row <- 1
  while(row < nrow(all)){
    adder <- filter(elev, lake_id == all[row, "lake_id"][[1]])
    curr_id <- all[row, "lake_id"][[1]]
    while(all[row, "lake_id"][[1]] == curr_id){

      all[row, "elevation"] <- adder[1, "elevation"][[1]]
      row <- row + 1

      if (row > nrow(all)){
        break
      }
      if (all[row, "lake_id"][[1]] != curr_id){
        break
      }

    }

    if (row > nrow(all)){
      break
    }

  }
  return(all)
}

The function works, but it was estimated to take about 9 hours.该功能有效,但估计需要大约 9 个小时。 After looking around in some reference books, I found that I could accomplish the same thing by simply using "all <- left_join(all, elevation, by = "lake_id")".看了一些参考书后,我发现我可以通过简单地使用“all <- left_join(all,elevation, by = "lake_id")”来完成同样的事情。 This finished up in less than a second, and seemingly all 3,400,000 data points were correct.这在不到一秒的时间内完成,似乎所有 3,400,000 个数据点都是正确的。 The only way I can think of doing this was through iteration, so I have no idea how that small line of code finished up so quickly.我能想到的唯一方法是通过迭代,所以我不知道那一小行代码是如何完成得如此之快的。 Can someone explain to me the magic of these tibbles?有人可以向我解释这些小东西的魔力吗?

R's magic is a vectorized approach when working on variables. R 的魔法是处理变量时的矢量化方法。 It is much faster than writing native looping structures that perform the same thing.它比编写执行相同操作的本机循环结构要快得多

Vectorization sometimes uses recycling to ensure data structures have the same size in order to perform the operation faster.矢量化有时使用回收来确保数据结构具有相同的大小,以便更快地执行操作。 Element assignments (like in your example) tend to require copies of variables, which slows down processing.元素分配(如您的示例中)往往需要变量的副本,这会减慢处理速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM