简体   繁体   English

Snakemake:仅扩展通配符的子集

[英]Snakemake: expanding only a subset of wildcards

I can't find the solution for this probably easy problem:我找不到这个可能很简单的问题的解决方案:

I have this snakefile, which first produces the following files:我有这个蛇文件,它首先生成以下文件:

  • data/sample1_P1.txt数据/sample1_P1.txt

  • data/sample1_P2.txt数据/sample1_P2.txt

  • data/sample2_P1.txt数据/sample2_P1.txt

  • data/sample2_P2.txt数据/sample2_P2.txt

In the next step, it just concatenates the files to one file concatenated/concatenated.txt .在下一步中,它只是将文件连接到一个文件concatenated/concatenated.txt

This is the minimal, reproducible example:这是最小的、可重现的示例:

pairs = {"P1" : "P1", "P2" : "P2"}

samples = {
    "sample1": "sample1",
    "sample2": "sample2"
}

rule all:
    input: "concatenated/concatenated.txt"

rule get_txt_files:
    output:
        "data/{sample}_{pair}.txt"
    shell:
        """
        echo 1 > {output}
        """

rule concatenate:
  input:
    expand("data/{sample}_{pair}.txt", sample=samples, \
        pair=pairs)
  output:
    "concatenated/concatenated.txt"
  shell:
    "cat {input} > {output};"

My question is simple: How can I modify the rule concatenate , so that it concatenates the files with the same sample name?我的问题很简单:如何修改规则concatenate ,以便连接具有相同示例名称的文件?

Desired output would be:所需的 output 将是:

  • concatenated/sample1.txt串联/sample1.txt

  • concatenated/sample2.txt串联/sample2.txt

Any help would be appreciated.任何帮助,将不胜感激。

EDIT编辑

I have a very similar follow-up question, so I don't think it's necessary to open a new question again:我有一个非常相似的后续问题,所以我认为没有必要再次提出新问题:

What if my expected output would be as follows:如果我预期的 output 如下所示:

  • data/sample1/sample1_P1数据/样本1/样本1_P1

  • data/sample1/sample1_P2数据/样本1/样本1_P2

  • data/sample2/sample2_P1数据/样本2/样本2_P1

  • data/sample2/sample2_P2数据/样本2/样本2_P2

To be clear: I only want to create a new direcotry and move the files into that bespoke direcoty.需要明确的是:我只想创建一个新的目录并将文件移动到该定制的目录中。

It seemed intuitive to do it like this:这样做似乎很直观:

pairs = {"P1" : "P1", "P2" : "P2"}

samples = {
    "sample1": "sample1",
    "sample2": "sample2"
}

rule all:
    input: expand("data/{sample}/{sample}_{pair}.txt", sample=samples, pair = pairs)

rule get_txt_files:
    output:
        "data/{sample}_{pair}.txt"
    shell:
        """
        echo 1 > {output}
        """

rule reorganise:
  input:
    expand("data/{{sample}}_{pair}.txt", \
        pair=pairs)
  output:
    "data/{sample}/{sample}_{pair}.txt"
  shell:
    "mv {input} data/{wildcards.sample}/.;"

Can you spot the problem?你能发现问题吗?

Thanks a lot in advance非常感谢提前

rule concatenate:
  input:
    expand("data/{{sample}}_{pair}.txt", pair=pairs)
  output:
    "concatenated/{sample}.txt"
  shell:
    "cat {input} > {output};"

Answer to q in comment:在评论中回答 q:

from snakemake.io import expand # automatically imported in Snakemake 

expand("data/{{sample}}_{pair}.txt", pair="A B C".split())
# ['data/{sample}_A.txt', 'data/{sample}_B.txt', 'data/{sample}_C.txt']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM