简体   繁体   中英

adding new variables to a data frame using dplyr select and mutate functions in a for loop

Please let me know if this is a duplicate. I've looked over Stack and have found questions that are similar, but don't quite answer my question. I'm a beginner, so I appreciate anyone's help.

I am trying to add yearly summary variables to a data frame of monthly data (ie, summing across rows). Here is an example data frame with limited monthly data.

df <- data.frame("Jan.2012" = c(1, 4, 5, 6), "Feb.2012" = c(3, 5, 7, 9),
 "Jan.2013" = c(6, 8, 9, 10), "Feb.2013" = c(7, 5, 11, 13), "Jan.2014" = c(6, 8, 9, 11), 
 "Feb.2014" = c(7, 3, 5, 9))

The new variables would be named TotalYr2012, TotalYr2013, etc. For example, TotalYr2012 = c(4, 9, 12, 15), etc.

I am trying to iterate over a for loop (not best practice I know) to generate these variables. I know I am doing some things wrong with the assign statement as well as I get an error.

for (i in 2012:2014) {
  varname <- paste("TotalYr", i, sep = "")
    assign(df$varname, df %>% select(contains("i")) %>% 
     mutate(varname = sum()))
}

Thanks for your help!

You can avoid for-loop by using tidyr::gather and zoo::yearmon functions to calculate summary for each year.

library(tidyverse)
library(zoo)

df %>% gather(Date, value) %>%
  mutate(Date = as.yearmon(Date,"%b.%Y")) %>%
  group_by(Year = year(Date)) %>%
  summarise(Total = sum(value)) %>%
  spread(Year, Total)

# # A tibble: 1 x 3
#     `2012` `2013` `2014`
# *   <dbl>  <dbl>  <dbl>
# 1   40.0   69.0   58.0  

You've got a lot of problems.

  1. $ doesn't work with variables. Use [ or [[ instead. See this R-FAQ for additional info . Also your indentation is messy, let's make it consistent:

     for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") assign(df[[varname]], df %>% select(contains("i")) %>% mutate(varname = sum())) } 
  2. assign isn't needed, just use <- (or = ). You should almost never use assign() .

     for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains("i")) %>% mutate(varname = sum())) } 
  3. "i" is a string, it's value is always the letter "i" , just like the value of 2 is always 2 . Even in contains() you want to use the value you have assigned to the object i (though that value does need to be a string, that is, it needs to be of class character ):

     for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains(as.character(i))) %>% mutate(varname = sum())) } 
  4. mutate and select return data frames, which means your code is trying to assign a one-column data frame do df[[varname]] . We want to assign just a column vector, not a whole data frame. So we pull out the column vector, with dplyr::pull

     for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains(as.character(i))) %>% mutate(varname = sum())) %>% pull() } 
  5. Type sum() in your console - you get 0 . You need give something to sum() . Let's get rid of mutate entirely and just sum the pull ed vector, this way we don't have to worry about its name:

     for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains(as.character(i))) %>% pull %>% sum } 
  6. Okay, it sort of works now. But you're adding these new values to the old data frame, which has a bunch of rows. The new values are just single values, so they're "recycled", repeated on every row of the data frame. Let's create a new result data frame instead which will have just one row for our result:

     result = list() for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") result[[varname]] <- df %>% select(contains(as.character(i))) %>% pull %>% sum } result = as.data.frame(result) result # TotalYr2012 TotalYr2013 TotalYr2014 # 1 24 36 24 
  7. Now it works, and gives a working solution. However, it is still messy. for loops usually aren't necessary. We have much better tools for tidying data.

     # See MKR's answer. It's the way you should actually do this. 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM