[英]In snakemake, how do you use wildcards with scatter-gather processes?
I am trying to use snakemake's scatter-gather functionality to parallelize a slow step in my workflow.我正在尝试使用snakemake 的分散收集功能来并行化我工作流程中的缓慢步骤。 However, I cannot figure out how to apply it in situations where I am using wildcards.
但是,我无法弄清楚如何在使用通配符的情况下应用它。 For example, I have defined the wildcard
library
in rule all
, however, this does not seem to apply to the scatter function in ScatterIntervals
:例如,我在
rule all
中定义了通配符library
,但是,这似乎不适用于ScatterIntervals
中的 scatter 函数:
import re
SCATTER_COUNT = 100
scattergather:
split=SCATTER_COUNT
rule all:
input:
expand("{library}_output.txt", library=["FC19271512", "FC19271513"])
rule ScatterIntervals:
input:
"{library}_baits.interval_list"
output:
temp(scatter.split("tmp/{library}_baits.scatter_{scatteritem}.interval_list"))
params:
output_prefix = (
lambda wildcards, output:
re.sub("\.scatter_\d+\.interval_list", "", output[0])
),
scatter_count = SCATTER_COUNT
shell:
"""
python ScatterIntervals.py \
-i {input} \
-o {params.output_prefix} \
-s {params.scatter_count}
"""
rule ProcessIntervals:
input:
bam = "{library}.bam",
baits = "tmp/{library}_baits.scatter_{scatteritem}.interval_list"
output:
temp("tmp/{library}_output.scatter_{scatteritem}.txt")
shell:
"""
python ProcessIntervals.py \
-b {input.bam} \
-l {input.baits} \
-o {output}
"""
rule GatherIntervals:
input:
gather.split("tmp/{library}_output.scatter_{scatteritem}.txt")
output:
"{library}_output.txt"
run:
inputs = "-i ".join(input)
command = f"python GatherOutputs.py {inputs} -o {output[0]}"
shell(command)
WildcardError in line 16 of Snakefile:
No values given for wildcard 'library'.
Evidently this works like expand
, in that you can quote the wildcards that aren't scatteritem
if you want DAG resolution to deal with them:显然,这类似于
expand
,因为如果您希望 DAG 解析来处理它们,您可以引用不是scatteritem
的通配符:
temp(scatter.split("tmp/{{library}}_baits.scatter_{scatteritem}.interval_list"))
The same logic applies for gather.split
.相同的逻辑适用于
gather.split
。
I modify my code as mentioned like this.我像这样修改我的代码。
rule fastq_fasta:
input:rules.trimmomatic.output.out_file
output:"data/trimmed/{sample}.fasta"
shell:"sed -n '1~4s/^@/>/p;2~4p' {input} > {output}"
rule split:
input:
"data/trimmed/{sample}.fasta"
params:
scatter_count=config["scatter_count"],
scatter_item = lambda wildcards: wildcards.scatteritem
output:
temp(scatter.split("data/trimmed/{{sample}}_{scatteritem}.fasta"))
script:
"scripts/split_files.py"
rule process:
input:"data/trimmed/{sample}_{scatteritem}.fasta"
output:"data/processed/{sample}_{scatteritem}.csv"
script:
"scripts/split_files.py"
rule gather:
input:
gather.split("data/processed/{{sample}}_{scatteritem}.csv")
output:
"data/processed/{sample}.csv"
shell:
"cat {input} > {output}"
However, i got AmbiguousRuleException: Rules fastq_to_fasta(which is previous rule) and split are ambiguous for the file data/trimmed/Ornek_411-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-8.fasta
但是,我得到
AmbiguousRuleException: Rules fastq_to_fasta(which is previous rule) and split are ambiguous for the file data/trimmed/Ornek_411-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-8.fasta
I tried lots of things but either rules are not calling or take AmbiguousRuleException.我尝试了很多东西,但要么规则没有调用,要么采用 AmbiguousRuleException。 What am i missing, can someone help?
我错过了什么,有人可以帮忙吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.