简体   繁体   中英

How to limit usage of disk space in Snakemake?

I work with 8 paired-end fastq files with 150 GB each, which need to be processed by a pipeline with space-demanding sub-tasks. I tried several options but I am still running out out disk space:

  • used temp to delete output files when not needed anymore
  • used disk_mb resources to limit number of parallel jobs.

I use the following execution to limit my disk space usage to 500GB, but apparently this is not guaranteed and exceeds the 500GB. How to limit the disk usage to a fixed value to avoid running out of disk space ?

snakemake --resources disk_mb=500000 --use-conda --cores 16  -p
rule merge:
  input:
    fw="{sample}_1.fq.gz",
    rv="{sample}_2.fq.gz",
  output:
    temp("{sample}.assembled.fastq")
  resources:
    disk_mb=100000
  threads: 16
  shell:
    """
    merger-tool -f {input.fw} -r {input.rv} -o {output}
    """


rule filter:
  input:
    "{sample}.assembled.fastq"
  output:
    temp("{sample}.assembled.filtered.fastq")
  resources:
    disk_mb=100000
  shell:
    """
    filter-tool {input} {output}
    """


rule mapping:
  input:
    "{sample}.assembled.filtered.fastq"
  output:
    "{sample}_mapping_table.txt"
  resources:
    disk_mb=100000
  shell:
    """
    mapping-tool {input} {output}
    """

Snakemake does not have the functionality to constrain resources, but can only schedule jobs in a way that respects resource constraints.

Now, snakemake uses resources to limit concurrent jobs, while your problem has a cumulative aspect to it. Taking a look at this answer , one way to resolve this is to introduce priority , so that downstream tasks have highest priority.

In your particular file, it seems that adding priority to the mapping rule should be sufficient:

rule mapping:
    input:
        "{sample}.assembled.filtered.fastq"
    output:
        "{sample}_mapping_table.txt"
    resources:
        disk_mb=100_000
    priority: 100
    shell:
        """
        mapping-tool {input} {output}
        """

You might also want to be careful about launching the rule initially (to avoid filling up the disk space with results of merge ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM