简体   繁体   English

Snakemake 重建/重新安排工作

[英]Snakemake rebuild/reschedule of jobs

I am writing a pipeline in Snakemake for people that don't have much programming knowledge, so I want to want them to be able to run the entire pipeline by only requesting snakemake all -c in the command line.我正在为没有太多编程知识的人在 Snakemake 中编写一个管道,所以我希望他们能够通过仅在命令行中请求snakemake all -c来运行整个管道。

I have 2 config files in my Snakefile:我的 Snakefile 中有 2 个配置文件:

configfile: "config.yaml"
configfile: "config_samples.yaml"

These config files will be merged together by Snakemake.这些配置文件将由 Snakemake 合并在一起。

config.yaml is the standard config file. config.yaml是标准配置文件。 config_samples.yaml is a config file of which its contents change depending on the pipeline input. config_samples.yaml是一个配置文件,其内容根据管道输入而变化。 It looks like the following:它看起来像下面这样:

samples:
  CYP20130000B:
    R1: CYP20130000B_R1.fastq
    R2: CYP20130000B_R2.fastq
  SAT20020000A:
    R1: SAT20020000A_R1.fastq
    R2: SAT20020000A_R2.fastq
  ...

I am using a Python script in a Snakemake rule to generate the contents of config_samples.yaml (using the snakemake script directive).我在 Snakemake 规则中使用 Python 脚本来生成config_samples.yaml的内容(使用 snakemake 脚本指令)。 This works fine.这工作正常。

However, when I list all my wanted output in the all rule like so:但是,当我在all规则中列出所有想要的输出时,如下所示:

"config_samples.done", # flag file for rule that generates config_samples.yaml
expand(QC_raw_reads/{sample}_{direction}_fastqc.html", sample=config["samples"], direction=["R1", "R2"])

Then this won't work, because the expand() will only expand to samples that are in the current config_samples.yaml , so before the Python script actually generates the new config_samples.yaml with the new samples.那么这将不起作用,因为expand()只会扩展到当前config_samples.yaml中的样本,所以在 Python 脚本实际使用新样本生成新的config_samples.yaml之前。

This can easily be avoided by running the rule that generates config_samples.yaml separately before running the enterire pipeline, but to refer back to the beginning, I want it to stay as easy as possible for non-programmers.这可以通过在运行 enterire 管道之前单独运行生成config_samples.yaml的规则来轻松避免,但回到开头,我希望它对非程序员尽可能简单。

So, I was wondering if there is a way to let Snakemake rebuild/reschedule the jobs, so that they can be updated for the new samples.所以,我想知道是否有办法让 Snakemake 重建/重新安排作业,以便可以为新样本更新它们。

I haven't fully digested the questions and the comments underneath, but I feel agreeing with @DmitryKuzminov in that the current setup is a bit contrieved.我还没有完全消化下面的问题和评论,但我同意 @DmitryKuzminov 的观点,因为当前的设置有点做作。

In any case, I think you can make snakemake to regenerate config_samples.yaml by adding towards the top of the Snakefile, before the first rule, something like:在任何情况下,我认为您可以通过在第一个规则之前向 Snakefile 的顶部添加来使 snakemake 重新生成config_samples.yaml ,例如:

if os.path.exists('config_samples.yaml'):
    os.remove('config_samples.yaml')

you could even make that step controllable on the command line with:您甚至可以使用以下命令在命令行上控制该步骤:

if config['remake_config'] == 'yes':
    if os.path.exists('config_samples.yaml'):
        os.remove('config_samples.yaml')

and then execute with:然后执行:

snakemake -C remake_config='yes' ...

Your entire problem seems to come about because of this sentence:你的整个问题似乎都是因为这句话:

I am using a Python script in a Snakemake rule to generate the contents of config_samples.yaml (using the snakemake script directive).我在 Snakemake 规则中使用 Python 脚本来生成 config_samples.yaml 的内容(使用 snakemake 脚本指令)。 This works fine.这工作正常。

The solution could be as simple as realising that the Python script could run outside of a Snakemake rule just at the beginning of your Snakefile.解决方案可能就像意识到 Python 脚本可以在 Snakefile开头的Snakemake 规则之外运行一样简单。 A Snakefile is basically a Python script. Snakefile 基本上是一个 Python 脚本。 So you can just import your script into that Snakefile and execute it there.因此,您可以将脚本导入该 Snakefile 并在那里执行。

So your Snakefile could look like this:所以你的 Snakefile 可能是这样的:

import config_generator
config = config_generator.add_samples(config)

rule all:
   input: expand(QC_raw_reads/{sample}_{direction}_fastqc.html", sample=config["samples"], direction=["R1", "R2"])

rule A:
...

config_generator.py would look like this: config_generator.py看起来像这样:

def add_samples(config):
    # here you can calculate whatever you calculate to get config_samples
    # e.g. config["samples"] = ...
    return config

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM