Please let me know if this is a duplicate. I've looked over Stack and have found questions that are similar, but don't quite answer my question. I'm a beginner, so I appreciate anyone's help.
I am trying to add yearly summary variables to a data frame of monthly data (ie, summing across rows). Here is an example data frame with limited monthly data.
df <- data.frame("Jan.2012" = c(1, 4, 5, 6), "Feb.2012" = c(3, 5, 7, 9),
"Jan.2013" = c(6, 8, 9, 10), "Feb.2013" = c(7, 5, 11, 13), "Jan.2014" = c(6, 8, 9, 11),
"Feb.2014" = c(7, 3, 5, 9))
The new variables would be named TotalYr2012, TotalYr2013, etc. For example, TotalYr2012 = c(4, 9, 12, 15), etc.
I am trying to iterate over a for loop (not best practice I know) to generate these variables. I know I am doing some things wrong with the assign statement as well as I get an error.
for (i in 2012:2014) {
varname <- paste("TotalYr", i, sep = "")
assign(df$varname, df %>% select(contains("i")) %>%
mutate(varname = sum()))
}
Thanks for your help!
You can avoid for-loop
by using tidyr::gather
and zoo::yearmon
functions to calculate summary for each year.
library(tidyverse)
library(zoo)
df %>% gather(Date, value) %>%
mutate(Date = as.yearmon(Date,"%b.%Y")) %>%
group_by(Year = year(Date)) %>%
summarise(Total = sum(value)) %>%
spread(Year, Total)
# # A tibble: 1 x 3
# `2012` `2013` `2014`
# * <dbl> <dbl> <dbl>
# 1 40.0 69.0 58.0
You've got a lot of problems.
$
doesn't work with variables. Use [
or [[
instead. See this R-FAQ for additional info . Also your indentation is messy, let's make it consistent:
for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") assign(df[[varname]], df %>% select(contains("i")) %>% mutate(varname = sum())) }
assign
isn't needed, just use <-
(or =
). You should almost never use assign()
.
for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains("i")) %>% mutate(varname = sum())) }
"i"
is a string, it's value is always the letter "i"
, just like the value of 2
is always 2
. Even in contains()
you want to use the value you have assigned to the object i
(though that value does need to be a string, that is, it needs to be of class character
):
for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains(as.character(i))) %>% mutate(varname = sum())) }
mutate
and select
return data frames, which means your code is trying to assign a one-column data frame do df[[varname]]
. We want to assign just a column vector, not a whole data frame. So we pull out the column vector, with dplyr::pull
for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains(as.character(i))) %>% mutate(varname = sum())) %>% pull() }
Type sum()
in your console - you get 0
. You need give something to sum()
. Let's get rid of mutate
entirely and just sum
the pull
ed vector, this way we don't have to worry about its name:
for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") df[[varname]] <- df %>% select(contains(as.character(i))) %>% pull %>% sum }
Okay, it sort of works now. But you're adding these new values to the old data frame, which has a bunch of rows. The new values are just single values, so they're "recycled", repeated on every row of the data frame. Let's create a new result
data frame instead which will have just one row for our result:
result = list() for (i in 2012:2014) { varname <- paste("TotalYr", i, sep = "") result[[varname]] <- df %>% select(contains(as.character(i))) %>% pull %>% sum } result = as.data.frame(result) result # TotalYr2012 TotalYr2013 TotalYr2014 # 1 24 36 24
Now it works, and gives a working solution. However, it is still messy. for
loops usually aren't necessary. We have much better tools for tidying data.
# See MKR's answer. It's the way you should actually do this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.