繁体   English   中英

如何从两个列表中获取每个组合?

[英]How to take each combination from two lists?

我获取可用数据并根据某些条件对其进行过滤(根据列的特定值删除行)。 然后我根据这个数据训练一个model。 稍后,我再次从头开始获取相同的数据,但这次我使用与之前相同的标准或不同的标准来测试 model。 然后我做 ROC 和瀑布图。 我的问题是,我想从两个列表中获取每个组合。 例如:

list1 = list(c('a','b','c'),c('A','B','C'))
list2 = list(c('x','y','z'),c('X','Y','Z'))

我想要一个 for 循环来运行c('a','b','c')c('x','y','z')分析,然后c('a','b','c')c('X','Y','Z') 之后继续c('A','B','C')c('x','y','z') ,最后c('A','B','C')c('X','Y','Z')

这是我的代码。 现在我知道你可能会说use_trainuse_test是一样的。 他们不会保持不变,这只是暂时的。 对我来说,处理两个列表比处理一个列表更容易。 这里每个 model 和每个 plot 都存储在我在 for 循环之前创建的列表中。 我应该在 for 循环中创建一个 for 循环吗?

use_train = list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test = list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model

xgb_models = auc_test = auc_test_plot = data_list = waterfall = list() 

for(i in 1:length(use_train)){
  
  data_list[[i]] = create_data(mydata,metadata, 
                                  recist.use = use_train[[i]], case = 'CR', use_batch = FALSE, seed=40)
  
  xgb_models[[i]] = train_ici(data_list[[i]])
  #parallelStop()
  
  auc_test[[i]] = evaluate_model(xgb_models[[i]], mydata, metadata, 
                         recist.use = use_test[[i]], case = 'CR' , use_batch = FALSE, seed = 40)
  
  auc_test_plot[[i]] = evaluate_model_plot(xgb_models[[i]], data_list[[i]][[2]])
  
  waterfall[[i]] = waterfall(xgb_models[[i]], metadata, data_list[[i]][[2]], case  = 'CR',
                                train.recist = use_train[[i]], test.recist = use_test[[i]])
}

所以最后,我将进行 4 轮:

  1. 来自use_train : c('CR','PR','SD')和来自use_test : c('CR','PR','SD')
  2. 来自use_trainc('CR','PR','SD')和来自use_testc('CR','PR','SD','PD')
  3. 来自use_train : c('CR','PR','SD','PD')和来自use_test : c('CR','PR','SD')
  4. 来自use_trainc('CR','PR','SD','PD')和来自use_testc('CR','PR','SD','PD')

编辑 -

这个样本来自function create_data之后的数据..所以这里我已经创建了数据并且它已经为train_ici function做好了准备。

structure(list(`totaldata_new[, "RECIST"]` = c("PD", "SD", "PR", 
"PD", "PD", "PD", "PD", "PR", "SD", "PD", "SD", "PD", "PD", "PD", 
"PR", "CR", "PD", "PR", "SD", "SD", "SD", "PD", "SD", "PR", "PD"
), Gender = c("male", "female", "female", "female", "male", "female", 
"female", "male", "male", "male", "female", "male", "female", 
"female", "male", "female", "female", "male", "male", "male", 
"female", "male", "female", "male", "male"), treatment = c("anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1"
), Cancer_Type = c("Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma"), `CD4-T-cells` = c(-0.0741098696855045, 
-0.094401270881699, 0.0410284948786532, -0.163302950330185, -0.0942478217207681, 
-0.167314411991775, -0.118272811489486, -0.0366277340916379, 
-0.0349008907108641, -0.167823357941815, -0.0809646843667242, 
-0.140727850456348, -0.148668434567449, -0.0726825919321525, 
-0.062499826731091, -0.0861178015030313, -0.117687306656149, 
-0.141342090175904, -0.206661192280272, -0.15593285099477, -0.0897617831679252, 
-0.0627645386986058, -0.136416087222329, -0.100351419040291, 
-0.167041995646525)), row.names = c("Pt1", "Pt10", "Pt101", "Pt103", 
"Pt106", "Pt11", "Pt17", "Pt18", "Pt2", "Pt24", "Pt26", "Pt27", 
"Pt28", "Pt29", "Pt3", "Pt30", "Pt31", "Pt34", "Pt36", "Pt37", 
"Pt38", "Pt39", "Pt4", "Pt44", "Pt46"), class = "data.frame")

R 如果问题是可并行化的,则尽量避免使用 for 循环。 相反,您可以创建一个数据框来保存由expand.grid()创建的所有组合,并创建一个具有相应结果的附加列:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

list1 <-list(c('a','b','c'),c('A','B','C'))
list2 <- list(c('x','y','z'),c('X','Y','Z'))

# do some stuff with the 2 vars
do_stuff <- function(l1, l2) {
  length(l1) + length(l2) + runif(1)
}

expand.grid(list1, list2) |>
  rowwise() |>
  mutate(result = do_stuff(Var1, Var2))
#> # A tibble: 4 × 3
#> # Rowwise: 
#>   Var1      Var2      result
#>   <list>    <list>     <dbl>
#> 1 <chr [3]> <chr [3]>   6.43
#> 2 <chr [3]> <chr [3]>   6.91
#> 3 <chr [3]> <chr [3]>   6.26
#> 4 <chr [3]> <chr [3]>   6.08

reprex package (v2.0.1) 创建于 2023-01-07

这是我的主张。 我使用 lapply 和 unlist 创建列表列表。 然后 lapply 而不是附加到每个列表。

use_train <- list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test <- list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model

train_test <- unlist(lapply(use_train, \(x) lapply(use_test, \(y) list(
  train=x,test=y))), F)

output = lapply(train_test, function(tt){
  data_list <- create_data(mydata,metadata, 
           recist.use = tt$train, case = 'CR', use_batch = FALSE, seed=40)
  
  xgb_models <- train_ici(data_list)

  auc_test <- evaluate_model(
    xgb_models, mydata, metadata, 
    recist.use = tt$test, case = 'CR' , use_batch = FALSE, seed = 40)
  
  auc_test_plot <- evaluate_model_plot(
    xgb_models, data_list[[2]])
  
  waterfall <- waterfall(
    xgb_models, metadata, data_list[[2]], case  = 'CR',
    train.recist = tt$train, test.recist = tt$test)
  
  return(list(
    data_list = data_list, xgb_models = xgb_models, auc_test = auc_test,
    auc_test_plot = auc_test_plot, waterfall = waterfall))
})

output


暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM