I work with 8 paired-end fastq files with 150 GB each, which need to be processed by a pipeline with space-demanding sub-tasks. I tried several options but I am still running out out disk space:
I use the following execution to limit my disk space usage to 500GB, but apparently this is not guaranteed and exceeds the 500GB. How to limit the disk usage to a fixed value to avoid running out of disk space ?
snakemake --resources disk_mb=500000 --use-conda --cores 16 -p
rule merge:
input:
fw="{sample}_1.fq.gz",
rv="{sample}_2.fq.gz",
output:
temp("{sample}.assembled.fastq")
resources:
disk_mb=100000
threads: 16
shell:
"""
merger-tool -f {input.fw} -r {input.rv} -o {output}
"""
rule filter:
input:
"{sample}.assembled.fastq"
output:
temp("{sample}.assembled.filtered.fastq")
resources:
disk_mb=100000
shell:
"""
filter-tool {input} {output}
"""
rule mapping:
input:
"{sample}.assembled.filtered.fastq"
output:
"{sample}_mapping_table.txt"
resources:
disk_mb=100000
shell:
"""
mapping-tool {input} {output}
"""
Snakemake
does not have the functionality to constrain resources, but can only schedule jobs in a way that respects resource constraints.
Now, snakemake
uses resources
to limit concurrent jobs, while your problem has a cumulative aspect to it. Taking a look at this answer , one way to resolve this is to introduce priority
, so that downstream tasks have highest priority.
In your particular file, it seems that adding priority
to the mapping
rule should be sufficient:
rule mapping:
input:
"{sample}.assembled.filtered.fastq"
output:
"{sample}_mapping_table.txt"
resources:
disk_mb=100_000
priority: 100
shell:
"""
mapping-tool {input} {output}
"""
You might also want to be careful about launching the rule initially (to avoid filling up the disk space with results of merge
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.