简体   繁体   中英

Writing an R function but how do I modify it to loop over columns in a dataframe and loop over multiple datasets?

I'm a novice to R and I've written very clunky functions to apply to geochemical datasets to interpolate data to nearby dates, convert decimal dates, reshape & average the geochemical data by year/month, and spit it all out at the end as a new data frame. However, it only does one column at a time and there are anywhere between 2-10 columns of data per dataset and I have over 50 datasets. This requires a lot of copy and pasting and I know there should be a better way to do it but I've tried and failed to get anywhere for months.

I have tried reading up on this but haven't been able to implement any loops I've seen suggested elsewhere.

Here is an example of my datasets:

Year    SrCa MgCa BaCa
1958.00 8.98 4.29 4.77
1958.08 9.00 4.21 4.56
1958.17 9.02 4.16 4.39
...  

Here are the functions I have written:

#Interpolates monthly or bimonthly data to dates for the 15th of every month
yrmonth_INTERP<-function(dataset, agecolumn, variable1, var1name){
  X_in = dataset[[agecolumn]] #select X (Age) column
  y_in = dataset[[variable1]] # select y (data) column
  x_out <- seq.Date(as.Date("1920/01/15"), as.Date("2017/12/15"), by = "months") #create reference dates
  x_out <- decimal_date(x_out) #reference dates to decimal dates
  xy_int <- approx(x = X_in, y = y_in,  xout = x_out) #interpolate data
  xy_int <- signif(as.data.frame(xy_int, row.names = NULL), digits = 12)
  xy_int <- na.omit(xy_int)
  Age<-date_decimal(xy_int[[1]]) #convert decimal to date
  Year<-year(Age)
  Month<-month(Age)
  Day<-day(Age)
  var1<-xy_int[[2]] #pull out variable
  newdata<-cbind.data.frame(Year, Month, Day, var1) #create dataframe
  date1 <-paste(newdata$Year,newdata$Month, newdata$Day,sep="-") #put together separate time variables into date
  date1 <- ymd(date1) #convert to date
  data_months <- cbind(date1, newdata) #add date column to previous dateframe
  colnames(data_months) = c('Age', 'Year', 'Months', 'Day', var1name) #name columns
  return (data_months)
}
#Turns lots of data points into the average for every month
yrmonth_avg<-function(dataset, agecolumn, variable1, var1name, varsum){
  Age<-date_decimal(dataset[[agecolumn]]) #convert decimal to date
  Year<-year(Age)
  Month<-month(Age)
  Day<-day(Age)
  var1<-dataset[[variable1]] #pull out data variable
  newdata<-cbind.data.frame(Year, Month, Day, var1) #create dataframe of time and data
  datamelt = melt(newdata, id = c('Year', 'Month', 'Day')) 
  datacast = dcast(datamelt, variable ~ Year + Month, mean) #wide cast/reshape data to row to get mean by year and month
  datacast2 = dcast(datamelt, variable ~ Year + Month, sum) #wide cast/reshape data to row to get mean by year and month
  Var1Data = datacast[-1:0] #remove first column
  Var1sum = datacast2[-1:0] 
  re_data = gather(Var1Data, key='Age', value = var1name) #reshape mean data to columns
  re_data1 = gsub("_", "-", re_data$Age) #pull out info to make date
  re_data2 <- ymd(re_data1, truncated = 1) #create date
  day(re_data2) <- 15
  newColNames <- c("Year", "Month")
  newCols <- colsplit(re_data1, "-", newColNames) #keep separated time period columns
  re_sum = gather(Var1sum, key='Age', value = 'Sum') #return sum data to columns
  data_months <- cbind(re_data2, re_data[[2]], re_sum[[2]], newCols) #create dataframe
  data_months[[4]] <- as.numeric(data_months[[4]])
  data_months[[5]] <- as.numeric(data_months[[5]])
  colnames(data_months) = c('Age', var1name, varsum, 'Year', 'Months')
  return (data_months)
}

And what I get out at the end is:

Age        SrCa     Year Months
1958-01-15 8.989589 1958 1
1958-02-15 9.009619 1958 2
1958-03-15 9.035000 1958 3 
...

Can I put a loop of some kind in there to apply the function to all columns in the dataframe so I don't have to run the function 2-10 times to get all the averaged geochemical data averaged?

Do I need to break up the different actions of the function to make this possible?

Can I apply this function across a list of the other dataframes?

EDIT: realised I had extraneous data and two separate functions I should have mentioned that essentially do the same thing with datasets of different reolution

If you bring all your data frames in a list , this should do:

apply_all <- function(list_of_dfs){ # apply to all data frames
    return(lapply(list_of_dfs, function(df) apply(df,2,YRMONTH)))) # apply to all columns of a data frame
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM