[英]R Faster alternative for nested loops
我想執行 wilcox 測試。 我有 2 個數據框列表。 Datalist 包含 2 年期間不同觀察的數量。 Varlist 包含不同場景中的 Case 和 Control-Days。
我現在想檢查一下,每次觀察的次數與病例天數和對照天數的不同場景之間是否存在差異。 因此我使用 wilcox.test(~)。
作為 output,我想要一個 dataframe,包括大小寫和控制的 2 個值、p 值,當然還有所有列表和列名,以正確匹配結果。
我有一個具有 4 次嵌套循環的工作解決方案,但它非常慢(至少需要 10 天)。 有誰知道如何用更快的代碼解決這個問題?
set.seed(42)
n <- 365
df1 = data.frame(Date=seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), "day"),
D1 = sample(18:30, n, replace=TRUE),
D2 = sample(0:7, n, replace=TRUE),
D3 = sample(0:10, n, replace=TRUE),
D4 = sample(0:4, n, replace=TRUE),
D5 = sample(0:23, n, replace=TRUE))
set.seed(7)
n <- 365
df2 = data.frame(Date=seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), "day"),
D1 = sample(18:30, n, replace=TRUE),
D2 = sample(0:7, n, replace=TRUE),
D3 = sample(0:10, n, replace=TRUE),
D4 = sample(0:4, n, replace=TRUE),
D5 = sample(0:23, n, replace=TRUE))
set.seed(9)
n <- 365
df3 = data.frame(Date=seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), "day"),
D1 = sample(18:30, n, replace=TRUE),
D2 = sample(0:7, n, replace=TRUE),
D3 = sample(0:10, n, replace=TRUE),
D4 = sample(0:4, n, replace=TRUE),
D5 = sample(0:23, n, replace=TRUE))
Datalist = list(df1, df2, df3)
set.seed(2)
n <- 365
Var1 = data.frame(Date=seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), "day"),
V1 = sample(c("Case", "Control", NA), n, replace=TRUE),
V2 = sample(c(NA, "Case", "Control"), n, replace=TRUE),
V3 = sample(c("Control", "Case", NA), n, replace=TRUE))
set.seed(6)
n <- 365
Var2 = data.frame(Date=seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), "day"),
V1 = sample(c("Case", "Control", NA), n, replace=TRUE),
V2 = sample(c(NA, "Case", "Control"), n, replace=TRUE),
V3 = sample(c("Control", "Case", NA), n, replace=TRUE))
set.seed(23)
n <- 365
Var3 = data.frame(Date=seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), "day"),
V1 = sample(c("Case", "Control", NA), n, replace=TRUE),
V2 = sample(c(NA, "Case", "Control"), n, replace=TRUE),
V3 = sample(c("Control", "Case", NA), n, replace=TRUE))
Varlist = list(Var1, Var2, Var3)
編輯:這是我的代碼:
Results = data.frame(matrix(ncol = 7, nrow = 0))
colnames(Results) = c("Code","ICD", "Cond", "Case", "Control", "pValue", "Ver")
for (a in 1:length(Datalist)) {
print(names(Datalist)[a])
for (b in 2:length(Datalist[[a]])) {
for (c in 1:length(Varlist)) {
for (d in 2:ncol(Varlist[[c]])){
Ill = Datalist[[a]][,b]
cutpoint = nrow(Datalist[[a]])
Group = Varlist[[c]][,d]
Group = Group[1:cutpoint]
casecontrol = na.omit(data.frame(Ill, Group))
wiltest = wilcox.test(casecontrol$Ill ~ casecontrol$Group)
stats = tapply(casecontrol$Ill,casecontrol$Group,mean)
Code = names(Datalist)[a]
ICD = colnames(Datalist[[a]])[b]
Cond = colnames(Varlist[[c]])[d]
Case = round(stats[1],2)
Control = round(stats[2],2)
pValue = round(wiltest$p.value, 2)
Ver = names(Varlist)[c]
addrow = c(Code, ICD, Case, Control, pValue, Ver)
Results= rbind(Results,addrow)}}}}
您的代碼有兩個錯誤:
addrow = c(Code, ICD, Case, Control, pValue, Ver)
只有 6 個元素,但Results
是用 7 列創建的;addrow = c(Code, ICD, Case, Control, pValue, Ver)
混合字符和數字數據,將所有內容強制轉換為字符。下面的代碼解決了這些錯誤並將執行速度提高了 2 倍。一旦糾正了上述錯誤,結果是相同的。 主要區別是保留 memory 來存儲循環之前的結果,並且只在最后創建返回 data.frame。
g <- function(Datalist, Varlist) {
ntotal <- length(Datalist) * (length(Datalist[[1]]) - 1L) * length(Varlist) * (ncol(Varlist[[1]]) - 1L)
Code <- character(ntotal)
ICD <- character(ntotal)
Cond <- character(ntotal)
Case <- numeric(ntotal)
Control <- numeric(ntotal)
pValue <- numeric(ntotal)
Ver <- character(ntotal)
i <- 0L
for (a in 1:length(Datalist)) {
print(names(Datalist)[a])
for (b in 2:length(Datalist[[a]])) {
for (c in 1:length(Varlist)) {
for (d in 2:ncol(Varlist[[c]])){
Ill = Datalist[[a]][,b]
cutpoint = nrow(Datalist[[a]])
Group = Varlist[[c]][,d]
Group = Group[1:cutpoint]
casecontrol = na.omit(data.frame(Ill, Group))
wiltest = wilcox.test(Ill ~ Group, data = casecontrol)
stats = tapply(casecontrol$Ill,casecontrol$Group,mean)
i <- i + 1L
Code[i] = names(Datalist)[a]
ICD[i] = colnames(Datalist[[a]])[b]
Cond[i] = colnames(Varlist[[c]])[d]
Case[i] = round(stats[1],2)
Control[i] = round(stats[2],2)
pValue[i] = round(wiltest$p.value, 2)
Ver[i] = names(Varlist)[c]
}
}
}
}
data.frame(Code, ICD, Cond, Case, Control, pValue, Ver)
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.