简体   繁体   English

在snakemake 中,如何在分散-收集过程中使用通配符?

[英]In snakemake, how do you use wildcards with scatter-gather processes?

I am trying to use snakemake's scatter-gather functionality to parallelize a slow step in my workflow.我正在尝试使用snakemake 的分散收集功能来并行化我工作流程中的缓慢步骤。 However, I cannot figure out how to apply it in situations where I am using wildcards.但是,我无法弄清楚如何在使用通配符的情况下应用它。 For example, I have defined the wildcard library in rule all , however, this does not seem to apply to the scatter function in ScatterIntervals :例如,我在rule all中定义了通配符library ,但是,这似乎不适用于ScatterIntervals中的 scatter 函数:

import re

SCATTER_COUNT = 100
scattergather: 
    split=SCATTER_COUNT

rule all:
    input:
        expand("{library}_output.txt", library=["FC19271512", "FC19271513"])


rule ScatterIntervals:
    input:
        "{library}_baits.interval_list"
    output:
        temp(scatter.split("tmp/{library}_baits.scatter_{scatteritem}.interval_list"))
    params:
        output_prefix = (
            lambda wildcards, output: 
            re.sub("\.scatter_\d+\.interval_list", "", output[0])
        ),
        scatter_count = SCATTER_COUNT
    shell:
        """
        python ScatterIntervals.py \
            -i {input} \
            -o {params.output_prefix} \
            -s {params.scatter_count}
        """


rule ProcessIntervals:
    input:
        bam = "{library}.bam",
        baits = "tmp/{library}_baits.scatter_{scatteritem}.interval_list"
    output:
        temp("tmp/{library}_output.scatter_{scatteritem}.txt")
    shell:
        """
        python ProcessIntervals.py \
            -b {input.bam} \
            -l {input.baits} \
            -o {output}
        """


rule GatherIntervals:
    input:
        gather.split("tmp/{library}_output.scatter_{scatteritem}.txt")
    output:
        "{library}_output.txt"
    run:
        inputs = "-i ".join(input)
        command = f"python GatherOutputs.py {inputs} -o {output[0]}"
        shell(command)
    
WildcardError in line 16 of Snakefile: 
No values given for wildcard 'library'.

Evidently this works like expand , in that you can quote the wildcards that aren't scatteritem if you want DAG resolution to deal with them:显然,这类似于expand ,因为如果您希望 DAG 解析来处理它们,您可以引用不是scatteritem的通配符:

temp(scatter.split("tmp/{{library}}_baits.scatter_{scatteritem}.interval_list"))

The same logic applies for gather.split .相同的逻辑适用于gather.split

I modify my code as mentioned like this.我像这样修改我的代码。

rule fastq_fasta:
    input:rules.trimmomatic.output.out_file
    output:"data/trimmed/{sample}.fasta"
    shell:"sed -n '1~4s/^@/>/p;2~4p' {input} > {output}"

rule split:
    input:
        "data/trimmed/{sample}.fasta"
    params:
        scatter_count=config["scatter_count"],
        scatter_item = lambda wildcards: wildcards.scatteritem
    output:
        temp(scatter.split("data/trimmed/{{sample}}_{scatteritem}.fasta"))
    script:
        "scripts/split_files.py"
        
rule process:
    input:"data/trimmed/{sample}_{scatteritem}.fasta"
    output:"data/processed/{sample}_{scatteritem}.csv"
    script:
        "scripts/split_files.py"

rule gather:
    input:
        gather.split("data/processed/{{sample}}_{scatteritem}.csv")
    output:
        "data/processed/{sample}.csv"
    shell:
        "cat {input} > {output}"

However, i got AmbiguousRuleException: Rules fastq_to_fasta(which is previous rule) and split are ambiguous for the file data/trimmed/Ornek_411-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-8.fasta但是,我得到AmbiguousRuleException: Rules fastq_to_fasta(which is previous rule) and split are ambiguous for the file data/trimmed/Ornek_411-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-8.fasta

I tried lots of things but either rules are not calling or take AmbiguousRuleException.我尝试了很多东西,但要么规则没有调用,要么采用 AmbiguousRuleException。 What am i missing, can someone help?我错过了什么,有人可以帮忙吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 带有通配符 AmbiguousRuleException 的 Snakemake scatter-gather - Snakemake scatter-gather with wildcard AmbiguousRuleException Snakemake在python函数的路径中使用通配符 - Snakemake use wildcards in path of python function 是否可以在 Snakemake 管道的配置文件中使用通配符? - Is it possible to use wildcards in config files for a Snakemake pipeline? Snakemake:通配符不会在规则的脚本行中扩展 - Snakemake: wildcards do not expand in script line of rule 在 snakemake 规则 output 中使用配置和通配符 - Use config and wildcards in snakemake rule output 实现如果不希望使用某些特定的通配符组合,如何在snakemake中使用expand? - Implementation How to use expand in snakemake when some particular combinations of wildcards are not desired? 当不需要通配符的某些组合(缺少输入文件)且使用“合并”规则时,如何在snakemake中使用expand? - How to use expand in snakemake when some combinations of wildcards are not desired (missing input files), with a “merge” rule? 如何在引号,单引号或反引号中使用通配符或任何其他特殊字符? - How do you use wildcards or any other special character in quotes, single quotes or backtick? 我如何使用这个 Makefile 来使用通配符? - How do I expend this Makefile to use wildcards? 如何在PDO Prepare中使用通配符? - How do i use wildcards in PDO Prepare?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM