[英]Can I run a try-except block (or similar) inside a Snakemake rule?
[英]How can I reevaluate params of Snakemake rule in every run?
我有一个玩具 Snakefile:
rule DUMMY:
output: "foo.txt"
resources:
mem_mb=lambda wildcards, attempt: 1024 * 2 ** (attempt - 1)
params:
max_mem=lambda wildcards, resources: resources.mem_mb * 1024
shell:
"""
echo {resources.mem_mb} {params.max_mem}
ulimit -v {params.max_mem}
python3 -c 'import numpy; x=numpy.ones(1_000_000_000)' && touch {output}
"""
它可以像我期望的那样在snakemake v. 6.6.1 中工作:
$ snakemake -v
6.6.1
$ snakemake foo.txt -j 1 --restart-times 3
Building DAG of jobs...
[...]
1024 1048576
[...]
Trying to restart job 0.
Select jobs to execute...
[...]
2048 2097152
[...]
4096 4194304
[...]
8192 8388608
[Tue May 17 11:56:42 2022]
Finished job 0.
1 of 1 steps (100%) done
[...]
但在 v. 7.3.8 中失败:
$ snakemake -v
7.3.8
$ snakemake foo.txt -j 1 --restart-times 3
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
----- ------- ------------- -------------
DUMMY 1 1 1
total 1 1 1
Select jobs to execute...
[Tue May 17 12:02:34 2022]
rule DUMMY:
output: foo.txt
jobid: 0
resources: tmpdir=/tmp, mem_mb=1024
1024 1048576
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "[...]/lib/python3.7/site-packages/numpy/core/numeric.py", line 204, in ones
a = empty(shape, dtype, order)
numpy.core._exceptions.MemoryError: Unable to allocate 7.45 GiB for an array with shape (1000000000,) and data type float64
[Tue May 17 12:02:35 2022]
Error in rule DUMMY:
jobid: 0
output: foo.txt
shell:
echo 1024 1048576
ulimit -v 1048576
python3 -c 'import numpy; x=numpy.ones(1_000_000_000)' && touch foo.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Trying to restart job 0.
[...]
2048 1048576
[...]
4096 1048576
[...]
Trying to restart job 0.
Select jobs to execute...
[Tue May 17 12:02:35 2022]
rule DUMMY:
output: foo.txt
jobid: 0
resources: tmpdir=/tmp, mem_mb=8192
8192 1048576
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "[...]/lib/python3.7/site-packages/numpy/core/numeric.py", line 204, in ones
a = empty(shape, dtype, order)
numpy.core._exceptions.MemoryError: Unable to allocate 7.45 GiB for an array with shape (1000000000,) and data type float64
[Tue May 17 12:02:35 2022]
Error in rule DUMMY:
jobid: 0
output: foo.txt
shell:
echo 8192 1048576
ulimit -v 1048576
python3 -c 'import numpy; x=numpy.ones(1_000_000_000)' && touch foo.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-05-17T120234.647036.snakemake.log
resources.mem_mb
每次尝试都会更新,而params.max_mem
会卡在 1048576。如何强制params.max_mem
更新? 它是错误还是功能?
由于params.max_mem
和resources.mem_mb
之间的关系非常简单,所以我使用 bash 算法作为 walkarround:
rule DUMMY:
output: "foo.txt"
resources:
mem_mb=lambda wildcards, attempt: 1024 * 2 ** (attempt - 1)
shell:
"""
ulimit -v $(( {resources.mem_mb} * 1024))
echo {resources.mem_mb} $(ulimit -v)
python3 -c 'import numpy; x=numpy.ones(1_000_000_000)' && touch {output}
"""
我发布答案,因为有人可能会觉得它有用,但这绝对不是更复杂关系的解决方案。
默认情况下, resources
指令将假定给定资源是无限的,因此指定任意资源并在shell
中引用它是另一种解决方案:
rule DUMMY:
output: "foo.txt"
resources:
mem_mb=lambda wildcards, attempt: 1024 * 2 ** (attempt - 1),
max_mem=lambda wildcards, attempt: 1024 * 2 ** (attempt - 1) * 1024,
shell:
"""
ulimit -v {resources.max_mem}
echo {resources.max_mem} $(ulimit -v)
python3 -c 'import numpy; x=numpy.ones(1_000_000_000)' && touch {output}
"""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.