简体   繁体   English

如何在snakemake输出中执行简单的字符串操作

[英]How to perform simple string operations in snakemake output

I am creating my first snakemake file, and I got to the point where I need to perform a simple string operation on the value of my output , so that my shell command works as expected: 我正在创建我的第一个snakemake文件,我到了需要对output值执行简单字符串操作的点,以便我的shell命令按预期工作:

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o {output}'

I need to apply the split function to {output} so that only the name of the file up to the extension is used. 我需要将split函数应用于{output}以便只使用扩展名前的文件名。 I couldn't find anything in the docs or in related questions. 我在文档或相关问题中找不到任何内容。

You could use the params field: 你可以使用params字段:

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  params:
    dir = 'out/genomes'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o {params.dir}'

Alternative solution using wildcards: 使用通配符的替代解决方案:

rule all:
  input: 'out/genomes.msh'

rule sketch:
  input:
    '{file}.txt'
  output:
    '{file}.msh'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o {wildcards.file}'

Untested, but I think this should work. 未经测试,但我认为这应该有效。

The advantage over the params solution is that it generalizes better. params解决方案相比的优势在于它更好地推广。

Avoid duplicating text. 避免重复文本。 Don't use params unless you convert your input/outputs to wildcards + extentions. 除非将输入/输出转换为通配符+扩展名,否则不要使用参数。 Otherwise you're left with a rule that is hard to maintain. 否则你会留下难以维护的规则。

input:
    "{pathDIR}/{genome}.txt"
output:
    "{pathDIR}/{genome}.msh"
params:
    dir: '{pathDIR}/{genome}'

Otherwise, use Python's slice notation . 否则, 使用Python的切片表示法

I couldn't seem to get slice notation to work in the params using the output wildcard. 我似乎无法使用输出通配符在params中使用切片表示法。 Here it is in the run directive. 这是在run指令中。

from subprocess import call

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  run:
    callString="mash sketch -l " + str(input) + " -k 31 -s 100000 -o " + str(output)[:-4]
    print(callString)
    call(callString, shell=True)

Python underlies Snakemake. Python是Snakemake的基础。 I prefer the "run" directive over the "shell" directive because I find it really unlocks a lot of that beautiful Python functionality. 我更喜欢“run”指令而不是“shell”指令,因为我发现它确实解锁了很多漂亮的Python功能。 The accessing of params and various things are slightly different that with the "shell" directive. params和各种事物的访问与“shell”指令略有不同。

Eg 例如

callString=config["mpileup_samtoolsProg"] + ' view -bh -F ' + str(config["bitFlag"]) + ' ' + str(input.inputBAM) + ' ' + wildcards.chrB2M[1:] 

A bit of a snippet of JK using the run directive. 使用run指令的一小部分JK。

All of the rules in my modules pretty much use the run directive 我模块中的所有规则都使用run指令

Best is to use params : 最好是使用params

rule sketch:
    input:
        'out/genomes.txt'
    output:
        'out/genomes.msh'
    params:
        prefix=lambda wildcards, output: os.path.splitext(output[0])[0]
    shell:
        'mash sketch -l {input} -k 31 -s 100000 -o {params.prefix}'

It is always preferable to use params instead of using the run directive, because the run directive cannot be combined with conda environments. 始终最好使用params而不是使用run指令,因为run指令不能与conda环境结合使用。

You could remove the extension within the shell command 您可以在shell命令中删除扩展名

rule sketch:
  input:
    'out/genomes.txt'
  output:
    'out/genomes.msh'
  shell:
    'mash sketch -l {input} -k 31 -s 100000 -o $(echo "{output}" | sed -e "s/.msh//")'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM