如何从两个列表中获取每个组合？

Question

我获取可用数据并根据某些条件对其进行过滤（根据列的特定值删除行）。 然后我根据这个数据训练一个model。 稍后，我再次从头开始获取相同的数据，但这次我使用与之前相同的标准或不同的标准来测试 model。 然后我做 ROC 和瀑布图。 我的问题是，我想从两个列表中获取每个组合。 例如：

list1 = list(c('a','b','c'),c('A','B','C'))
list2 = list(c('x','y','z'),c('X','Y','Z'))

我想要一个 for 循环来运行c('a','b','c')和c('x','y','z')分析，然后c('a','b','c')和c('X','Y','Z') 。 之后继续c('A','B','C')和c('x','y','z') ，最后c('A','B','C')和c('X','Y','Z') 。

这是我的代码。 现在我知道你可能会说use_train和use_test是一样的。 他们不会保持不变，这只是暂时的。 对我来说，处理两个列表比处理一个列表更容易。 这里每个 model 和每个 plot 都存储在我在 for 循环之前创建的列表中。 我应该在 for 循环中创建一个 for 循环吗？

use_train = list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test = list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model

xgb_models = auc_test = auc_test_plot = data_list = waterfall = list() 

for(i in 1:length(use_train)){
  
  data_list[[i]] = create_data(mydata,metadata, 
                                  recist.use = use_train[[i]], case = 'CR', use_batch = FALSE, seed=40)
  
  xgb_models[[i]] = train_ici(data_list[[i]])
  #parallelStop()
  
  auc_test[[i]] = evaluate_model(xgb_models[[i]], mydata, metadata, 
                         recist.use = use_test[[i]], case = 'CR' , use_batch = FALSE, seed = 40)
  
  auc_test_plot[[i]] = evaluate_model_plot(xgb_models[[i]], data_list[[i]][[2]])
  
  waterfall[[i]] = waterfall(xgb_models[[i]], metadata, data_list[[i]][[2]], case  = 'CR',
                                train.recist = use_train[[i]], test.recist = use_test[[i]])
}

所以最后，我将进行 4 轮：

来自use_train : c('CR','PR','SD')和来自use_test : c('CR','PR','SD')
来自use_train ： c('CR','PR','SD')和来自use_test ： c('CR','PR','SD','PD')
来自use_train : c('CR','PR','SD','PD')和来自use_test : c('CR','PR','SD')
来自use_train ： c('CR','PR','SD','PD')和来自use_test ： c('CR','PR','SD','PD') 。

编辑 -

这个样本来自function create_data之后的数据..所以这里我已经创建了数据并且它已经为train_ici function做好了准备。

structure(list(`totaldata_new[, "RECIST"]` = c("PD", "SD", "PR", 
"PD", "PD", "PD", "PD", "PR", "SD", "PD", "SD", "PD", "PD", "PD", 
"PR", "CR", "PD", "PR", "SD", "SD", "SD", "PD", "SD", "PR", "PD"
), Gender = c("male", "female", "female", "female", "male", "female", 
"female", "male", "male", "male", "female", "male", "female", 
"female", "male", "female", "female", "male", "male", "male", 
"female", "male", "female", "male", "male"), treatment = c("anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", 
"anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1", "anti-PD1"
), Cancer_Type = c("Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", "Melanoma", 
"Melanoma", "Melanoma", "Melanoma"), `CD4-T-cells` = c(-0.0741098696855045, 
-0.094401270881699, 0.0410284948786532, -0.163302950330185, -0.0942478217207681, 
-0.167314411991775, -0.118272811489486, -0.0366277340916379, 
-0.0349008907108641, -0.167823357941815, -0.0809646843667242, 
-0.140727850456348, -0.148668434567449, -0.0726825919321525, 
-0.062499826731091, -0.0861178015030313, -0.117687306656149, 
-0.141342090175904, -0.206661192280272, -0.15593285099477, -0.0897617831679252, 
-0.0627645386986058, -0.136416087222329, -0.100351419040291, 
-0.167041995646525)), row.names = c("Pt1", "Pt10", "Pt101", "Pt103", 
"Pt106", "Pt11", "Pt17", "Pt18", "Pt2", "Pt24", "Pt26", "Pt27", 
"Pt28", "Pt29", "Pt3", "Pt30", "Pt31", "Pt34", "Pt36", "Pt37", 
"Pt38", "Pt39", "Pt4", "Pt44", "Pt46"), class = "data.frame")

Answer 1

R 如果问题是可并行化的，则尽量避免使用 for 循环。 相反，您可以创建一个数据框来保存由expand.grid()创建的所有组合，并创建一个具有相应结果的附加列：

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

list1 <-list(c('a','b','c'),c('A','B','C'))
list2 <- list(c('x','y','z'),c('X','Y','Z'))

# do some stuff with the 2 vars
do_stuff <- function(l1, l2) {
  length(l1) + length(l2) + runif(1)
}

expand.grid(list1, list2) |>
  rowwise() |>
  mutate(result = do_stuff(Var1, Var2))
#> # A tibble: 4 × 3
#> # Rowwise: 
#>   Var1      Var2      result
#>   <list>    <list>     <dbl>
#> 1 <chr [3]> <chr [3]>   6.43
#> 2 <chr [3]> <chr [3]>   6.91
#> 3 <chr [3]> <chr [3]>   6.26
#> 4 <chr [3]> <chr [3]>   6.08

^{由reprex package (v2.0.1) 创建于 2023-01-07}

Answer 2

这是我的主张。 我使用 lapply 和 unlist 创建列表列表。 然后 lapply 而不是附加到每个列表。

use_train <- list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test <- list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model

train_test <- unlist(lapply(use_train, \(x) lapply(use_test, \(y) list(
  train=x,test=y))), F)

output = lapply(train_test, function(tt){
  data_list <- create_data(mydata,metadata, 
           recist.use = tt$train, case = 'CR', use_batch = FALSE, seed=40)
  
  xgb_models <- train_ici(data_list)

  auc_test <- evaluate_model(
    xgb_models, mydata, metadata, 
    recist.use = tt$test, case = 'CR' , use_batch = FALSE, seed = 40)
  
  auc_test_plot <- evaluate_model_plot(
    xgb_models, data_list[[2]])
  
  waterfall <- waterfall(
    xgb_models, metadata, data_list[[2]], case  = 'CR',
    train.recist = tt$train, test.recist = tt$test)
  
  return(list(
    data_list = data_list, xgb_models = xgb_models, auc_test = auc_test,
    auc_test_plot = auc_test_plot, waterfall = waterfall))
})

output

如何从两个列表中获取每个组合？

问题描述

编辑 -

2 个解决方案

解决方案1
3 2023-01-07 14:39:00

解决方案2
1 2023-01-07 15:47:41

如何从两个列表中获取每个组合？

问题描述

编辑 -

2 个解决方案

解决方案1 3 2023-01-07 14:39:00

解决方案2 1 2023-01-07 15:47:41

解决方案1
3 2023-01-07 14:39:00

解决方案2
1 2023-01-07 15:47:41