简体   繁体   中英

Can someone explain the underlying methods of R?

I'm coming from a background in Python and C++, and R seems to use magic that I don't understand. I was hoping someone would be able to give me some insight into how it works.

I was tasked with applying an algorithm to each row in a tibble of about 3,400,000 data points, and coming from C++, I thought to iterate over the table and calculate it manually and entering it into the tibble as such:

add_elev <- function(all, elev){
  row <- 1
  while(row < nrow(all)){
    adder <- filter(elev, lake_id == all[row, "lake_id"][[1]])
    curr_id <- all[row, "lake_id"][[1]]
    while(all[row, "lake_id"][[1]] == curr_id){

      all[row, "elevation"] <- adder[1, "elevation"][[1]]
      row <- row + 1

      if (row > nrow(all)){
      if (all[row, "lake_id"][[1]] != curr_id){


    if (row > nrow(all)){


The function works, but it was estimated to take about 9 hours. After looking around in some reference books, I found that I could accomplish the same thing by simply using "all <- left_join(all, elevation, by = "lake_id")". This finished up in less than a second, and seemingly all 3,400,000 data points were correct. The only way I can think of doing this was through iteration, so I have no idea how that small line of code finished up so quickly. Can someone explain to me the magic of these tibbles?

R's magic is a vectorized approach when working on variables. It is much faster than writing native looping structures that perform the same thing.

Vectorization sometimes uses recycling to ensure data structures have the same size in order to perform the operation faster. Element assignments (like in your example) tend to require copies of variables, which slows down processing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM