[英]Running a linear regression on multiple files in R
我有一個包含來自多個儀器運行的20個文本文件的文件夾。 數據部分都具有相同的格式
22/05/11; 16:03:28; 0.000; 6.079; 31.41; 84881; 25.60; E0;
22/05/11; 16:03:29; 0.017; 6.063; 31.44; 84868; 25.60; E0;
22/05/11; 16:03:30; 0.034; 6.079; 31.41; 84868; 25.60; E0;
22/05/11; 16:03:31; 0.051; 6.079; 31.41; 84868; 25.60; E0;
22/05/11; 16:03:32; 0.068; 6.068; 31.43; 84868; 25.60; E0;
22/05/11; 16:03:33; 0.085; 6.068; 31.43; 84881; 25.60; E0;
22/05/11; 16:03:34; 0.102; 6.079; 31.41; 84874; 25.60; E0;
我想要做的是讀取我的文件夾中的每個文件,運行線性回歸,並拉出斜率和R2值。
截至目前,這是我為單個文件執行此操作的代碼。
O2=read.table("Coral 1_Dark.txt",skip=58, sep=";",header=FALSE)
names(O2)<-c("Date","Time","Log_Time","O2_mgL","Phase","Amp","Temp C","Error Message")
O2$id<-seq_len(nrow(O2)) #creates unique ID for each measurement (use for regression)
attach(O2)
fit=lm(O2_mgL~id)
summary(fit)
運行此代碼后,我手動輸入斜率和R2數據。
現在我可以創建一個包含我感興趣的所有文件的變量
F=list.files()
這給了我所有20個文件
[1] "Coral 1_Dark.txt" "Coral 1_Light.txt" "Coral 10_Dark.txt" "Coral 10_Light.txt" "Coral 2_Dark.txt"
[6] "Coral 2_Light.txt" "Coral 3_Dark.txt" "Coral 3_Light.txt" "Coral 4_Dark.txt" "Coral 4_Light.txt"
[11] "Coral 5_Dark.txt" "Coral 5_Light.txt" "Coral 6_Dark.txt" "Coral 6_Light.txt" "Coral 7_Dark.txt"
[16] "Coral 7_Light.txt" "Coral 8_Dark.txt" "Coral 8_Light.txt" "Coral 9_Dark.txt" "Coral 9_Light.txt"
對於所有20個文件,我最終想要的是這樣的
Coral Slope R2
Coral 1_Dark 0.23 98.3
Coral 2_Dark 0.33 99.3
ECT
有什么建議么? 我從來沒有使用任何應用函數或任何類型的循環 - 但我認為這將要改變.....
像這樣的東西?
wd <- "C:/Data"
files <- dir(wd)
varnames <- c("Date", "Time", "Log_Time", "O2_mgL", "Phase", "Amp", "Temp C",
"Error Message")
results <- data.frame()
for (i in 1:length(files)) {
fname <- paste(wd, files[i], sep="/")
data <- read.table(fname, sep=";", skip=58)
colnames(data) <- varnames
data$id <- 1:nrow(data)
fit <- summary(lm(O2_mgL~id, data=data))
results[i,1] <- fit$coefficients[2]
results[i,2] <- fit$r.squared
}
rownames(results) <- sub(".txt", "", files)
colnames(results) <- c("Slope", "R2")
print(results)
這段代碼可能有機會。 使用attach並不是一個好主意,尤其是在創建函數時。
Coral_1_Dark.txt <- "22/05/11; 16:03:28; 0.000; 6.079; 31.41; 84881; 25.60; E0;
+ 22/05/11; 16:03:29; 0.017; 6.063; 31.44; 84868; 25.60; E0;
+ 22/05/11; 16:03:30; 0.034; 6.079; 31.41; 84868; 25.60; E0;
+ 22/05/11; 16:03:31; 0.051; 6.079; 31.41; 84868; 25.60; E0;
+ 22/05/11; 16:03:32; 0.068; 6.068; 31.43; 84868; 25.60; E0;
+ 22/05/11; 16:03:33; 0.085; 6.068; 31.43; 84881; 25.60; E0;
+ 22/05/11; 16:03:34; 0.102; 6.079; 31.41; 84874; 25.60; E0;
list_of_summaries <- sapply( 'Coral_1_Dark.txt', function(nam) {
O2 <- read.table(file=textConnection(get(nam)), sep=";",header=FALSE)
names(O2) <- c("Date","Time","Log_Time","O2_mgL","Phase",
"Amp","Temp C","Error Message", "junk")
O2$id <- seq_len( nrow(O2) )
fit=lm(O2_mgL~id , data=O2)
summ <- summary(fit)
return( c(slope= coef(fit)["id"], R2= summ[["r.squared"]] ) ) })
as.data.frame( list_of_summaries )
是的,您即將獲得應用功能的介紹。 基本上對文件夾執行dir()
調用,然后使用單個文件參數包裝您在函數中完成的所有操作。 並返回您感興趣的結果。然后在函數作為第二個參數的文件列表上調用lapply
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.