简体   繁体   中英

How to perform simple string operations in snakemake output

I am creating my first snakemake file, and I got to the point where I need to perform a simple string operation on the value of my output , so that my shell command works as expected:

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o {output}'

I need to apply the split function to {output} so that only the name of the file up to the extension is used. I couldn't find anything in the docs or in related questions.

You could use the params field:

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  params:
    dir = 'out/genomes'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o {params.dir}'

Alternative solution using wildcards:

rule all:
  input: 'out/genomes.msh'

rule sketch:
  input:
    '{file}.txt'
  output:
    '{file}.msh'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o {wildcards.file}'

Untested, but I think this should work.

The advantage over the params solution is that it generalizes better.

Avoid duplicating text. Don't use params unless you convert your input/outputs to wildcards + extentions. Otherwise you're left with a rule that is hard to maintain.

input:
    "{pathDIR}/{genome}.txt"
output:
    "{pathDIR}/{genome}.msh"
params:
    dir: '{pathDIR}/{genome}'

Otherwise, use Python's slice notation .

I couldn't seem to get slice notation to work in the params using the output wildcard. Here it is in the run directive.

from subprocess import call

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  run:
    callString="mash sketch -l " + str(input) + " -k 31 -s 100000 -o " + str(output)[:-4]
    print(callString)
    call(callString, shell=True)

Python underlies Snakemake. I prefer the "run" directive over the "shell" directive because I find it really unlocks a lot of that beautiful Python functionality. The accessing of params and various things are slightly different that with the "shell" directive.

Eg

callString=config["mpileup_samtoolsProg"] + ' view -bh -F ' + str(config["bitFlag"]) + ' ' + str(input.inputBAM) + ' ' + wildcards.chrB2M[1:] 

A bit of a snippet of JK using the run directive.

All of the rules in my modules pretty much use the run directive

Best is to use params :

rule sketch:
    input:
        'out/genomes.txt'
    output:
        'out/genomes.msh'
    params:
        prefix=lambda wildcards, output: os.path.splitext(output[0])[0]
    shell:
        'mash sketch -l {input} -k 31 -s 100000 -o {params.prefix}'

It is always preferable to use params instead of using the run directive, because the run directive cannot be combined with conda environments.

You could remove the extension within the shell command

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o $(echo "{output}" | sed -e "s/.msh//")'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM