![](/img/trans.png)
[英]Loop a function over multiple datasets and multiple columns within each dataset
[英]Writing an R function but how do I modify it to loop over columns in a dataframe and loop over multiple datasets?
我是 R 的新手,我編寫了非常笨重的函數來應用於地球化學數據集,以將數據內插到附近的日期,轉換十進制日期,按年/月重塑和平均地球化學數據,並在最后吐出作為一個新的數據框。 但是,它一次只處理一列,每個數據集有 2-10 列數據,我有 50 多個數據集。 這需要大量的復制和粘貼,我知道應該有更好的方法來做到這一點,但我已經嘗試了幾個月但沒有成功。
我已嘗試閱讀此內容,但無法實現我在其他地方看到的任何循環。
這是我的數據集的示例:
Year SrCa MgCa BaCa
1958.00 8.98 4.29 4.77
1958.08 9.00 4.21 4.56
1958.17 9.02 4.16 4.39
...
以下是我編寫的函數:
#Interpolates monthly or bimonthly data to dates for the 15th of every month
yrmonth_INTERP<-function(dataset, agecolumn, variable1, var1name){
X_in = dataset[[agecolumn]] #select X (Age) column
y_in = dataset[[variable1]] # select y (data) column
x_out <- seq.Date(as.Date("1920/01/15"), as.Date("2017/12/15"), by = "months") #create reference dates
x_out <- decimal_date(x_out) #reference dates to decimal dates
xy_int <- approx(x = X_in, y = y_in, xout = x_out) #interpolate data
xy_int <- signif(as.data.frame(xy_int, row.names = NULL), digits = 12)
xy_int <- na.omit(xy_int)
Age<-date_decimal(xy_int[[1]]) #convert decimal to date
Year<-year(Age)
Month<-month(Age)
Day<-day(Age)
var1<-xy_int[[2]] #pull out variable
newdata<-cbind.data.frame(Year, Month, Day, var1) #create dataframe
date1 <-paste(newdata$Year,newdata$Month, newdata$Day,sep="-") #put together separate time variables into date
date1 <- ymd(date1) #convert to date
data_months <- cbind(date1, newdata) #add date column to previous dateframe
colnames(data_months) = c('Age', 'Year', 'Months', 'Day', var1name) #name columns
return (data_months)
}
#Turns lots of data points into the average for every month
yrmonth_avg<-function(dataset, agecolumn, variable1, var1name, varsum){
Age<-date_decimal(dataset[[agecolumn]]) #convert decimal to date
Year<-year(Age)
Month<-month(Age)
Day<-day(Age)
var1<-dataset[[variable1]] #pull out data variable
newdata<-cbind.data.frame(Year, Month, Day, var1) #create dataframe of time and data
datamelt = melt(newdata, id = c('Year', 'Month', 'Day'))
datacast = dcast(datamelt, variable ~ Year + Month, mean) #wide cast/reshape data to row to get mean by year and month
datacast2 = dcast(datamelt, variable ~ Year + Month, sum) #wide cast/reshape data to row to get mean by year and month
Var1Data = datacast[-1:0] #remove first column
Var1sum = datacast2[-1:0]
re_data = gather(Var1Data, key='Age', value = var1name) #reshape mean data to columns
re_data1 = gsub("_", "-", re_data$Age) #pull out info to make date
re_data2 <- ymd(re_data1, truncated = 1) #create date
day(re_data2) <- 15
newColNames <- c("Year", "Month")
newCols <- colsplit(re_data1, "-", newColNames) #keep separated time period columns
re_sum = gather(Var1sum, key='Age', value = 'Sum') #return sum data to columns
data_months <- cbind(re_data2, re_data[[2]], re_sum[[2]], newCols) #create dataframe
data_months[[4]] <- as.numeric(data_months[[4]])
data_months[[5]] <- as.numeric(data_months[[5]])
colnames(data_months) = c('Age', var1name, varsum, 'Year', 'Months')
return (data_months)
}
我最后得到的是:
Age SrCa Year Months
1958-01-15 8.989589 1958 1
1958-02-15 9.009619 1958 2
1958-03-15 9.035000 1958 3
...
我可以在其中放置某種循環以將 function 應用於 dataframe 中的所有列,這樣我就不必運行 ZC1C425268E68385D1AB5074C17A94F14 平均地球化學 10 次?
我是否需要分解 function 的不同操作才能實現這一點?
我可以將此 function 應用於其他數據幀列表嗎?
編輯:意識到我有多余的數據和兩個我應該提到的獨立函數,它們基本上對不同分辨率的數據集做同樣的事情
如果您將所有數據框放在list
,則應該這樣做:
apply_all <- function(list_of_dfs){ # apply to all data frames
return(lapply(list_of_dfs, function(df) apply(df,2,YRMONTH)))) # apply to all columns of a data frame
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.