简体   繁体   English

Snakemake:{input:q} 不返回带引号的输入

[英]Snakemake: {input:q} does not return quoted input

I am developing an ATACseq pipeline using Genrich to run with Snakemake.我正在开发一个 ATACseq 管道,使用 Genrich 与 Snakemake 一起运行。

The fact is that Genrich allows to call peaks from more than one replicate in the same step, avoiding additional steps (ie IDR).事实上,Genrich 允许在同一步骤中调用来自多个重复的峰,避免额外的步骤(即 IDR)。

In Snakemake, I have found the way to return all the samples I want (ie replicates from one condition) at the same time, but Genrich asks for comma-separated files as input or space-separated files if each file is quoted.在 Snakemake 中,我找到了同时返回我想要的所有样本(即从一个条件复制)的方法,但如果每个文件都被引用,则 Genrich 要求以逗号分隔的文件作为输入或以空格分隔的文件。

Normally, the input return a list of space-separated files (ie file1 file2 file3), and since I don't know how I can make it return comma-separated files, I tried to quote them.通常,输入返回一个空格分隔文件列表(即 file1 file2 file3),由于我不知道如何让它返回逗号分隔文件,我试图引用它们。

In theory, after Snakemake version 5.8.0, you can refer to the input as {input:q} in the rule's shell command to return the quoted input, as said here .从理论上讲,Snakemake版本5.8.0后,你可以参考输入为{input:q}在规则的shell命令返回引用输入,如说在这里

However, in my case, the returned input is not quoted.但是,就我而言,返回的输入没有被引用。

I have created a test rule to see how the input is returned:我创建了一个测试规则来查看输入是如何返回的:

rule genrich_merge_test:
    input:
        lambda w: expand("{condition}.sorted.bam", condition = SAMPLES.loc[SAMPLES["CONDITION"] == w.condition].NAME),
    output:
        "{condition}_peaks.narrowPeak",
    shell:
        """
        echo {input:q} > {output}     
        """

And the returned input, which is stored in the output file is:存储在输出文件中的返回输入是:

rep1.sorted.bam rep2.sorted.bam

Does someone know how to solve this and return the quoted input or return a list of comma-separated files instead of space-separated files?有人知道如何解决这个问题并返回带引号的输入或返回逗号分隔文件列表而不是空格分隔文件吗?

Thank you.谢谢你。

Assuming your input filenames do not contain spaces (and if they do I strongly encourage avoiding them), you can simply put the list of files in quotes, you don't need to quote each file in the list:假设您的输入文件名不包含空格(如果有,我强烈建议避免使用它们),您可以简单地将文件列表放在引号中,您不需要引用列表中的每个文件:

rule genrich:
    input:
        t= ['a.bam', 'b.bam'],
    ...
    shell:
        r"""
        Genrich -t '{input.t}' ...
        """

(Note single quotes around '{input.t}' ) (注意'{input.t}'周围'{input.t}'单引号)

I was thinking echo and the shell may be stripping quotes before piping to output, but checking with snakemake -p to see the command being executed shows they aren't there.我在想 echo 和 shell 可能会在管道输出之前剥离引号,但是使用snakemake -p检查正在执行的命令表明它们不存在。 It seems like quotes only show up with individual filenames when spaces are present.当存在空格时,似乎引号只与单个文件名一起显示。

Dariober's answer should work to quote the list, but for completeness, if you want a comma-separated list of files, use a lambda function in a params directive: Dariober 的答案应该可以引用该列表,但为了完整起见,如果您想要一个逗号分隔的文件列表,请在 params 指令中使用 lambda 函数:

rule genrich_merge_test:
    input:
        lambda w: expand("{condition}.sorted.bam", 
                         condition=SAMPLES.loc[SAMPLES["CONDITION"] == w.condition].NAME),
    params:
        files=lambda wildcards, input: ','.join(input)
    output:
        "{condition}_peaks.narrowPeak",
    shell:
        """
        echo {params.files} > {output}     
        """

EDIT编辑

Here is a toy example demonstrating the use of params with input:这是一个玩具示例,演示了 params 与输入的使用:

# snakefile
inputs = expand('{wc}.out', wc=range(4))

rule all:
    input: "test_peaks.narrowPeak"

rule genrich:
    input:
        inputs
    params:
        files=lambda wildcards, input: ','.join(input)
    output:
        "test_peaks.narrowPeak",
    shell:
        """
        echo {params.files} > {output}     
        """

rule generator:
    output: touch('{file}.out')
$ snakemake -np
...
rule genrich:
    input: 0.out, 1.out, 2.out, 3.out
    output: test_peaks.narrowPeak
    jobid: 1


        echo 0.out,1.out,2.out,3.out > test_peaks.narrowPeak 
...

Also as indicated here另外,作为显示在这里

Note that in contrast to the input directive, the params directive can optionally take more arguments than only wildcards, namely input, output, threads, and resources.请注意,与 input 指令相比,params 指令可以选择接受更多的参数,而不仅仅是通配符,即输入、输出、线程和资源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM