I am creating my first snakemake file, and I got to the point where I need to perform a simple string operation on the value of my output
, so that my shell
command works as expected:
rule sketch:
input:
'out/genomes.txt'
output:
'out/genomes.msh'
shell:
'mash sketch -l {input} -k 31 -s 100000 -o {output}'
I need to apply the split
function to {output}
so that only the name of the file up to the extension is used. I couldn't find anything in the docs or in related questions.
You could use the params field:
rule sketch:
input:
'out/genomes.txt'
output:
'out/genomes.msh'
params:
dir = 'out/genomes'
shell:
'mash sketch -l {input} -k 31 -s 100000 -o {params.dir}'
Alternative solution using wildcards:
rule all:
input: 'out/genomes.msh'
rule sketch:
input:
'{file}.txt'
output:
'{file}.msh'
shell:
'mash sketch -l {input} -k 31 -s 100000 -o {wildcards.file}'
Untested, but I think this should work.
The advantage over the params
solution is that it generalizes better.
Avoid duplicating text. Don't use params unless you convert your input/outputs to wildcards + extentions. Otherwise you're left with a rule that is hard to maintain.
input:
"{pathDIR}/{genome}.txt"
output:
"{pathDIR}/{genome}.msh"
params:
dir: '{pathDIR}/{genome}'
Otherwise, use Python's slice notation .
I couldn't seem to get slice notation to work in the params using the output wildcard. Here it is in the run directive.
from subprocess import call
rule sketch:
input:
'out/genomes.txt'
output:
'out/genomes.msh'
run:
callString="mash sketch -l " + str(input) + " -k 31 -s 100000 -o " + str(output)[:-4]
print(callString)
call(callString, shell=True)
Python underlies Snakemake. I prefer the "run" directive over the "shell" directive because I find it really unlocks a lot of that beautiful Python functionality. The accessing of params and various things are slightly different that with the "shell" directive.
Eg
callString=config["mpileup_samtoolsProg"] + ' view -bh -F ' + str(config["bitFlag"]) + ' ' + str(input.inputBAM) + ' ' + wildcards.chrB2M[1:]
A bit of a snippet of JK using the run directive.
All of the rules in my modules pretty much use the run directive
Best is to use params
:
rule sketch:
input:
'out/genomes.txt'
output:
'out/genomes.msh'
params:
prefix=lambda wildcards, output: os.path.splitext(output[0])[0]
shell:
'mash sketch -l {input} -k 31 -s 100000 -o {params.prefix}'
It is always preferable to use params
instead of using the run
directive, because the run
directive cannot be combined with conda environments.
You could remove the extension within the shell command
rule sketch:
input:
'out/genomes.txt'
output:
'out/genomes.msh'
shell:
'mash sketch -l {input} -k 31 -s 100000 -o $(echo "{output}" | sed -e "s/.msh//")'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.