简体   繁体   中英

For each column in R data frame

I was wondering how for loops work in R data frames. This is not a reproducible example, but I'm wondering if the concept can work. If df has a Date, ID, Amount, and 4 variables, can I loop through the columns? I need to remove NA rows from columns Var1 to Var4, create a "weight vector" based off of the Amount column, then calculate the weighted mean.

a<- names(df)
a<- a[4:7]

a
[1] "Var1" "Var2" "Var3" "Var4"


#df has Date, ID, Amount ,Var1, Var2, Var3, Var4

for(i in a) {

  NEW <-df[ !is.na(df$i), ]
  NEW <- NEW %>%
    group_by(Date) %>%
    mutate(Weights = Amount/sum(Amount))

  SUM <-  NEW %>%
    group_by(Date) %>%
    summarise(Value = weighted.mean(i, Weights))

  write.csv(SUM , paste0(i, ".csv"))

}

You can loop through column, you have to make slight adjustments for your syntax, though. If you want to index your dataframe with a column name stored in a variable (in your loop the names are stored in the loop variable i ) you can access the column in the following ways:

1.) With the base-R subset syntax you have to use [,i] to subset the column you want:

df[,i]

NOTE: df$i will not work here.

2.) In dplyr functions you have to convert your character variable i to a name of your dataframe in the dplyr sense. This can be done by the function as.name . Next you have to evaluate the name so that the dplyr functions can work with it. This is done by the !! ("bang-bang") function:

df %>% select(!!as.name(i))

or in your case:

SUM <-  NEW %>%
   group_by(Date) %>%
   summarise(Value = weighted.mean(!!as.name(i), Weights))

Otherwise your syntax seems fine, just loop through a set of names and index the dataframe in the ways I described.Hope this answers your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM