簡體   English   中英

在snakemake中缺少所有規則的輸入文件

[英]Missing input files for rule all in snakemake

我正在嘗試構建一個用於生物合成基因雜波檢測的蛇形管道,但正在努力解決錯誤:

Missing input files for rule all:
antismash-output/Unmap_09/Unmap_09.txt
antismash-output/Unmap_12/Unmap_12.txt
antismash-output/Unmap_18/Unmap_18.txt

等等更多的文件。 據我所知,蛇文件中的文件生成應該可以正常工作:

    workdir: config["path_to_files"]
wildcard_constraints:
    separator = config["separator"],
    extension = config["file_extension"],
    sample = config["samples"]

rule all:
    input:
        expand("antismash-output/{sample}/{sample}.txt", sample = config["samples"])

# merging the paired end reads (either fasta or fastq) as prodigal only takes single end reads
rule pear:
    input:
        forward = "{sample}{separator}1.{extension}",
        reverse = "{sample}{separator}2.{extension}"

    output:
        "merged_reads/{sample}.{extension}"

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "pear -f {input.forward} -r {input.reverse} -o {output} -t 21"

# If single end then move them to merged_reads directory
rule move:
    input:
        "{sample}.{extension}"

    output:
        "merged_reads/{sample}.{extension}"

    shell:
        "cp {path}/{sample}.{extension} {path}/merged_reads/"

# Setting the rule order on the 2 above rules which should be treated equally and only one run.
ruleorder: pear > move
# annotating the metagenome with prodigal#. Can be done inside antiSMASH but prefer to do it out
rule prodigal:
    input:
        "merged_reads/{sample}.{extension}"

    output:
        gbk_files = "annotated_reads/{sample}.gbk",
        protein_files = "protein_reads/{sample}.faa"

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "prodigal -i {input} -o {output.gbk_files} -a {output.protein_files} -p meta"

# running antiSMASH on the annotated metagenome
rule antiSMASH:
    input:
        "annotated_reads/{sample}.gbk"

    output:
        touch("antismash-output/{sample}/{sample}.txt")

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "antismash --knownclusterblast --subclusterblast --full-hmmer --smcog --outputfolder antismash-output/{wildcards.sample}/ {input}"

這是我的 config.yaml 文件的示例:

file_extension: fastq
path_to_files: /home/lamma/ABR/Each_reads
samples:
- Unmap_14
- Unmap_55
- Unmap_37
separator: _

我看不出我在蛇文件中哪里出錯以產生這樣的錯誤。 為這個簡單的問題道歉,我是蛇形的新手。

問題是您錯誤地設置了全局通配符約束:

wildcard_constraints:
    separator = config["separator"],
    extension = config["file_extension"],
    sample = '|'.join(config["samples"])  # <-- this should fix the problem

然后緊接着另一個問題是extensionseperator通配符。 Snakemake 只能從其他文件名推斷這些應該是什么,您實際上無法通過通配符約束來設置這些。 我們可以使用f-string語法來填充值應該是什么:

rule pear:
    input:
        forward = f"{{sample}}{config['separator']}1.{{extension}}",
        reverse = f"{{sample}}{config['separator']}2.{{extension}}"
    ...

和:

rule prodigal:
    input:
        f"merged_reads/{{sample}}.{config['file_extension']}"
    ...

如果通配符約束讓您感到困惑,請查看 snakemake regex ,如果您對f""語法以及何時使用 single {以及何時使用 double {{對它們進行轉義感到困惑,請查找有關 f-strings 的博客。

希望有幫助!

(因為我還不能發表評論......)你的相對路徑可能有問題,我們看不到你的文件實際在哪里。

調試此問題的一種方法是使用config["path_to_files"]input:創建絕對路徑input:這將為您提供有關 Snakemake 期望文件的位置的更好的錯誤消息 - 輸入/輸出文件相對於工作目錄。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM