[英]How to create a plan target to cross over the results of previous map targets and a new variable?
From multiple target ( a
) created with map I have 2 other targets ( b
and d
) that iterate over the first target. 在使用map创建的多个目标(
a
)中,我有两个其他目标( b
和d
)遍历第一个目标。 Now I would like to use the results of these targets in another target. 现在,我想在另一个目标中使用这些目标的结果。 In addition I would like to cross with another variable (
model
). 另外,我还要介绍另一个变量(
model
)。
I pasted a reprex below, but for some context in my case a
describe different subset of a dataset, b
and d
pre-compute some stuff, e
apply different models on each subset using the pre-computed data. 我在下面粘贴了一个reprex,但在某些情况下,
a
描述了数据集的不同子集, b
和d
预计算了一些东西, e
使用预计算的数据对每个子集应用了不同的模型。
I tried different combination of map
cross
(like e
below) without success. 我尝试了
map
cross
其他组合(如下面的e
),但没有成功。 I tried to add all the targets names I want to use after in fn4 but it creates unnecessary crosses. 我试图在fn4中添加所有要使用的目标名称,但这会造成不必要的交叉。
library(drake)
drake_plan(
a = target(
fn1(arg1, arg2),
transform = map(
arg1 = !!c("arg11", "arg12"),
arg2 = !!c("arg21", "arg22")
)
),
b = target(
fn2(arg1),
transform = map(arg1)
),
d = target(
fn3(arg1),
transform = map(arg1)
),
e = target(
fn4(b, d, model, arg1),
transform = cross(
b,
d,
model = !!c("x", "y", "z"),
.by = arg1,
.id = c(arg1, model)
)
),
trace = TRUE
)
#> # A tibble: 18 x 10
#> target command arg1 arg2 a b d model .by e
#> <chr> <expr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a_arg11… fn1("arg11… "\"arg… "\"ar… a_arg… <NA> <NA> <NA> <NA> <NA>
#> 2 a_arg12… fn1("arg12… "\"arg… "\"ar… a_arg… <NA> <NA> <NA> <NA> <NA>
#> 3 b_arg11 fn2("arg11… "\"arg… "\"ar… a_arg… b_ar… <NA> <NA> <NA> <NA>
#> 4 b_arg12 fn2("arg12… "\"arg… "\"ar… a_arg… b_ar… <NA> <NA> <NA> <NA>
#> 5 d_arg11 fn3("arg11… "\"arg… "\"ar… a_arg… <NA> d_ar… <NA> <NA> <NA>
#> 6 d_arg12 fn3("arg12… "\"arg… "\"ar… a_arg… <NA> d_ar… <NA> <NA> <NA>
#> 7 e_NA_x fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"x… arg1 e_NA…
#> 8 e_NA_y fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"y… arg1 e_NA…
#> 9 e_NA_z fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"z… arg1 e_NA…
#> 10 e_NA_x_2 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"x… arg1 e_NA…
#> 11 e_NA_y_2 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"y… arg1 e_NA…
#> 12 e_NA_z_2 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"z… arg1 e_NA…
#> 13 e_NA_x_3 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"x… arg1 e_NA…
#> 14 e_NA_y_3 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"y… arg1 e_NA…
#> 15 e_NA_z_3 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"z… arg1 e_NA…
#> 16 e_NA_x_4 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"x… arg1 e_NA…
#> 17 e_NA_y_4 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"y… arg1 e_NA…
#> 18 e_NA_z_4 fn4(b_arg1… <NA> <NA> <NA> b_ar… d_ar… "\"z… arg1 e_NA…
Created on 2019-07-15 by the reprex package (v0.3.0) 由reprex软件包 (v0.3.0)创建于2019-07-15
It seems to work, but the arg1
and arg2
are not carried over and are not usable in fn4
and following targets. 看来可行,但是
arg1
和arg2
没有被继承并且不能在fn4
和后续目标中使用。 Should I split this step in 2 steps, if so how? 我应该将这一步分为两个步骤吗? (
map
then cross
, cross
then map
?) I tried to cross earlier, after a
, but I don't wont to recompute identical b
and d
multiple times, it may take a lot of time and memory. (
map
,然后cross
, cross
然后map
?)我想前面交叉,后a
,但我不惯于相同的重新计算b
和d
多次,它可能需要大量的时间和内存。
because many targets use the same data that need to be saved as file for the run
function (call to an external binaries), so to prevent re-computing the same thing multiple times and to save multiple times the same thing in different files (taht can be huge) I seperated all these tasks in Drake. 因为许多目标使用相同的数据,这些数据需要保存为
run
功能的文件(调用外部二进制文件),所以可以防止多次重新计算同一事物并将同一事物多次保存在不同的文件中(泰铢)可能很大)我在Drake中分离了所有这些任务。
library(drake)
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
path_data <- c("path/data_1.csv", "path/data_2.csv")
countries <- c("1", "2")
analysis_dir <- "path"
substudies_1 <- tribble(
~substudy, ~adjust, ~sex,
"sub1", "no", "male/female",
"sub2", "yes", "male/female"
)
models <- c("x", "y")
plan <- drake_plan(
data = target(
get_data(file_in(path)),
transform = map(path = !!path_data, country = !!countries, .id = country)
),
SNP = target(
get_SNP_data_country(SNP_gene, data),
transform = map(data, .id = country)
),
map = target(
# actually write file and save path
write_snp_map(SNP, file.path(analysis_dir, country, "SNP_map.txt")),
transform = map(SNP, .id = country)
),
ref = target(
# actually write file and save path
write_snp_ref(SNP, file.path(analysis_dir, country, "SNP_ref.txt")),
transform = map(SNP, .id = country)
),
# data_2 is managed in another target because it has a different set of substudies,
# this maybe can be tidied up, a problem for another day...
population_1 = target(
extract_population(data, sex, adjust),
transform = map(
data = data_1,
country = "1",
.data = !!substudies_1,
.id = c(substudy)
),
),
pedigree_1 = target(
extract_pedigree(data_1, population_1),
transform = map(
population_1,
.id = substudy
)
),
covariable_1 = target(
extract_covariable(data_1, population_1, adjust, sex),
transform = map(
population_1,
.id = substudy
)
),
# run_1 = target(
# run_fn(map_1, ref_1, pedigree_1, covariable_1, substudy, model, adjust, sex),
# transform = cross(population_1, model = !!models)
# ),
trace = TRUE
)
# the desired plan for the run target
run_plan <- tibble(
target = c("run_1_x_population_1_sub1", "run_1_y_population_1_sub1", "run_1_x_population_1_sub2", "run_1_y_population_1_sub2"),
command = list(
expr(run(map_1, ref_1, pedigree_1_sub1, covariable_1_sub1, "x", "sub1", "no")),
expr(run(map_1, ref_1, pedigree_1_sub1, covariable_1_sub1, "y", "sub1", "no")),
expr(run(map_1, ref_1, pedigree_1_sub2, covariable_1_sub2, "x", "sub2", "yes")),
expr(run(map_1, ref_1, pedigree_1_sub2, covariable_1_sub2, "y", "sub2", "yes"))
),
path = NA_character_,
country = "1",
population_1 = c(rep("population_1_sub1", 2), rep("population_1_sub2", 2)),
substudy = c(rep("sub1", 2), rep("sub2", 2)),
adjust = c(rep("no", 2), rep("yes", 2)),
sex = c(rep("male/female", 4)),
pedigree_1 = c(rep("pedigree_1_sub1", 2), rep("pedigree_1_sub2", 2)),
covariable_1 = c(rep("covariable_1_sub1", 2), rep("covariable_1_sub2", 2)),
model = c("x", "y", "x", "y"),
SNP = "SNP_1",
map = "map_1",
ref = "ref_1"
)
config <- drake_config(bind_rows(plan, run_plan))
vis_drake_graph(config, targets_only = TRUE)
Created on 2019-07-15 by the reprex package (v0.3.0) 由reprex软件包 (v0.3.0)创建于2019-07-15
plan: i.imgur.com/MyqoKJi.png 计划: i.imgur.com/MyqoKJi.png
Edit 2: 编辑2:
I now use the .data
parameter in a map transform using a dataframe with previous target names (using rlang::syms
) it works fine except that it doesn't work with drake::drake_plan
's max_expand
parameter. 我现在在地图转换中使用
.data
参数,该数据转换使用具有先前目标名称的数据rlang::syms
(使用rlang::syms
),除了不能与drake::drake_plan
的max_expand
参数一起使用外,它可以正常工作。 This solution is not optimal also because crafting a grid for .data
is very verbose. 此解决方案也不是最佳方案,因为为
.data
制作网格非常冗长。
Would you mind explicitly posting the plan you want without any transforms? 您是否介意无需任何转换就明确发布所需的计划?
drake_plan_source()
can help. drake_plan_source()
可以提供帮助。
One note: only combine()
understands .by
. 注意事项:只有
.by
combine()
理解.by
。 Maybe another approach is to use transform = map(.data = !!your_grid_of_combinations)
: https://ropenscilabs.github.io/drake-manual/plans.html#map . 也许另一种方法是使用
transform = map(.data = !!your_grid_of_combinations)
: https : transform = map(.data = !!your_grid_of_combinations)
。
Does the plan you want look something like this? 您想要的计划看起来像这样吗?
library(drake)
plan <- drake_plan(
a = target(
fn1(arg1, arg2),
transform = map(
arg1 = !!c("arg11", "arg12"),
arg2 = !!c("arg21", "arg22")
)
),
b = target(
fn2(arg1),
transform = map(arg1)
),
d = target(
fn3(arg1),
transform = map(arg1)
),
e = target(
fn4(b, d, model, arg1),
transform = cross(
b,
d,
model = c("x", "y", "z"),
arg1,
.id = c(arg1, model)
)
)
)
config <- drake_config(plan)
vis_drake_graph(config)
Created on 2019-07-15 by the reprex package (v0.3.0) 由reprex软件包 (v0.3.0)创建于2019-07-15
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.