简体   繁体   English

如何访问snakemake python 代码中的重试尝试?

[英]How to access retry attempts in snakemake python code?

When you execute a snakemake script with --restart-times >= 1 it will try to re-execute a failed run.当您使用 --restart-times >= 1 执行蛇形脚本时,它将尝试重新执行失败的运行。 Upon re-execution it is possible to access the number of execution attempts via a lambda function in "resources".重新执行后,可以通过“资源”中的 lambda function 访问执行尝试次数。 However, I would like to access the number of attempts in a block of python code outside of my rule.但是,我想在我的规则之外访问 python 代码块中的尝试次数。 I have tried to pass the attempt variable from the resources block to my python function, but to no avail.我试图将尝试变量从资源块传递给我的 python function,但无济于事。 My snakemake version is 5.32.1 and a quick test with 6.0.3 looks very similar.我的蛇形版本是 5.32.1,使用 6.0.3 进行的快速测试看起来非常相似。

def getTargetFiles(files, attempted):
    do stuff
    return modified-target-files

rule do_things_rule:
    input: 
        ...
    output:
        getTargetFiles("file/path.txt", resources.attempt)
    resources:
        attempt=lambda wildcards, attempt: attempt,

This unfortunately yields an error.不幸的是,这会产生错误。 "NameError in line 172 of xxxx.py: name 'resources' is not defined" “xxxx.py 第 172 行中的 NameError:未定义名称‘资源’”

The closest I have come, is to access "workflow.attempt" but this seems to be always set to 1. Perhaps this is the default value for attempts?我最接近的是访问“workflow.attempt”,但这似乎总是设置为 1。也许这是尝试的默认值?

rule do_things_rule:
    input: 
        ...
    output:
        getTargetFiles("file/path.txt", workflow.attempt)

I was taking a look at the internals of snakemake in the hope of finding a solution.我正在查看snakemake的内部结构,希望能找到解决方案。 Unfortunately my python knowledge isn't up to the task.不幸的是,我的 python 知识无法胜任这项任务。 There are some variables one can access in place of workflow.attempt, which do not have integer values.可以访问一些变量来代替 workflow.attempt,它们没有 integer 值。 Not sure if there is a way of getting the current number of attempts using these slightly differently:不确定是否有一种方法可以通过稍微不同的方式获得当前的尝试次数:

print snakemake.jobs.Job.attempt
<property object at 0x7f4eecba66d0>

print snakemake.jobs.Job._attempt
<member '_attempt' of 'Job' objects>

Here is a minimal working example, with which I could reproduce your error.这是一个最小的工作示例,我可以用它来重现您的错误。

def getTargetFiles(files, attempted):
  return f"{files[:-4]}-{attempted}.txt"

rule do_things_rule:
  resources:
    nr = lambda wildcards, attempt: attempt
  output:
    getTargetFiles("test.txt", resources.nr)
  shell:
    'echo "Failing on purpose to produce file'
    '{output} at attempt {resources.nr}'
    '"; exit 1 '

Indeed, output does not know resources .确实, output不知道resources I assume that this is because it needs to be accessed before the rule has run (see below).我认为这是因为需要在规则运行之前访问它(见下文)。 In contrast, if you replace getTargetFiles("test.txt", resources.nr) by getTargetFiles("test.txt", 1) , then the rule runs the right number of times and the shell command has access to resources.nr .相反,如果您将getTargetFiles("test.txt", resources.nr)替换为getTargetFiles("test.txt", 1) ,则规则运行正确的次数并且 shell 命令可以访问resources.nr

As far as I understand, there is a fundamental reason for this problem.据我了解,这个问题是有根本原因的。

A snakemake workflow is "defined in terms of rules that define how to create output files from input files. Dependencies between the rules are determined automatically". snakemake 工作流程“根据定义如何从输入文件创建 output 文件的规则进行定义。规则之间的依赖关系是自动确定的”。 (Quote from Tutorial ) This means that snakemake needs to know which output file will be created by that rule. (引自教程)这意味着snakemake需要知道该规则将创建哪个output文件。 Then, it will determine whether it needs to run the rule.然后,它将确定是否需要运行该规则。 Therefore, the number of attempts should, at least usually, not be part of the output file name.因此,尝试次数至少通常不应该是 output 文件名的一部分。

Maybe you want to combine the different files of the failed attempts?也许您想组合失败尝试的不同文件? However, if the rule fails, then there will be no output file.但是,如果规则失败,则不会有 output 文件。 Even if you force it.即使你强迫它。 The file will be removed by snakemake.该文件将被snakemake 删除。 (See example below) (见下面的例子)

def getTargetFiles(files, attempted):
  return f"{files[:-4]}-{attempted}.txt"

rule combine:
  input:
    'test-1.txt'
  output:
    'test-combined.txt'
  shell:
    'cat test-[0-9]*.txt > test-combined.txt'

rule do_things_rule:
  resources:
    nr = lambda wildcards, attempt: attempt
  output:
    getTargetFiles("test.txt", 1)
  shell:
    'touch {output}; '
    'echo "Failing on purpose to produce file'
    '{output} at attempt {resources.nr}'
    '"; exit 1 '

How about leaving the number of attempts out of the filename and instead use resources.nr in the shell command?在 shell 命令中使用resources.nr代替文件名中的尝试次数怎么样?

Hope that this offers a solution to your problem.希望这能为您的问题提供解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM