德雷克静态分支：如何在 map() 中使用 .id 来增加依赖图的可见性

Question

我正在使用 drake 工作流程来处理 ~100 个文件，这些文件存储在文件名很长的位置。 这些长文件名使依赖关系图难以阅读。 这是一个最小的例子：

# example setup
library(drake)
very_long_path <- "this_is_a_very_long_file_path_which_makes_the_dependency_graph_hard_to_read"
dir.create(very_long_path)
filenames <- paste0("file_", seq(4), ".csv")
for (file in filenames) {
    file.create(file.path(very_long_path, file))
}
files <- list.files(very_long_path, full.names = TRUE)
ids <- rlang::syms(filenames)

# my drake plan
plan <- drake_plan(
    raw = target(
        read.csv(file_in(!!file)),
        transform = map(file = !!files)
    )
)
plan

## A tibble: 4 x 2
#  target                                           command                                              
#  <chr>                                            <expr>                                               
#1 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~
#2 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~
#3 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~
#4 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~

vis_drake_graph(drake_config(plan)) ## very hard to read

我在?transformations中阅读了有关.id的以下内容：

符号或符号向量命名分组变量以合并到目标名称中。 用于创建短目标名称。 Set.id = FALSE 使用整数索引作为目标名称后缀。

这就是为什么我在上面的代码中创建了ids以便为目标提供简称。 但是如下更改计划没有帮助：

plan <- drake_plan(
    raw = target(
        readLines(file_in(!!file)),
        transform = map(file = !!files,
                        .id = !!ids)
    )
)
plan

## A tibble: 4 x 2
#  target                                           command                                              
#  <chr>                                            <expr>                                               
#1 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~
#2 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~
#3 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~
#4 raw_this_is_a_very_long_file_path_which_makes_t~ readLines(file_in("this_is_a_very_long_file_path_whic~

根据我的理解， ids是一个符号向量，所以我不明白为什么这不起作用。 我错过了什么？ 这可能吗？

我还尝试将ids作为字符向量插入，但没有成功。 我知道我可以设置.id = FALSE来简单地枚举 raw 的元素，但我真的想保留文件名。

Answer 1

你很亲近。 您需要做的就是将ids注册为分组变量，然后将分组变量符号传递给.id 。

library(drake)
very_long_path <- "this_is_a_very_long_file_path_which_makes_the_dependency_graph_hard_to_read"
dir.create(very_long_path)

filenames <- paste0("file_", seq(4), ".csv")

for (file in filenames) {
  file.create(file.path(very_long_path, file))
}

files <- list.files(very_long_path, full.names = TRUE)
ids <- rlang::syms(filenames)

plan <- drake_plan(
  raw = target(
    read.csv(file_in(!!file)),
    transform = map(
      file = !!files,
      id_var = !!ids, # Register the grouping variable.
      .id = id_var    # Use the existing grouping variable.
    )
  )
)

plan
#> # A tibble: 4 x 2
#>   target        command                                                         
#>   <chr>         <expr>                                                          
#> 1 raw_file_1.c… read.csv(file_in("this_is_a_very_long_file_path_which_makes_the…
#> 2 raw_file_2.c… read.csv(file_in("this_is_a_very_long_file_path_which_makes_the…
#> 3 raw_file_3.c… read.csv(file_in("this_is_a_very_long_file_path_which_makes_the…
#> 4 raw_file_4.c… read.csv(file_in("this_is_a_very_long_file_path_which_makes_the…

plan$target
#> [1] "raw_file_1.csv" "raw_file_2.csv" "raw_file_3.csv" "raw_file_4.csv"

^{由reprex 包(v0.3.0) 创建于 2020-01-21}

德雷克静态分支：如何在 map() 中使用 .id 来增加依赖图的可见性

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-01-21 13:43:22

德雷克静态分支：如何在 map() 中使用 .id 来增加依赖图的可见性

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-01-21 13:43:22

解决方案1
2 已采纳 2020-01-21 13:43:22