简体   繁体   English

Snakemake:如何在config.yml文件中保存和访问示例详细信息?

[英]Snakemake: How to save and access sample details in config.yml file?

Can anybody help me understand if it is possible to access sample details from a config.yml file when the sample names are not written in the snakemake workflow? 如果在蛇形工作流程中未编写样品名称,是否有人可以帮助我了解是否可以从config.yml文件访问样品详细信息? This is so I can re-use the workflow for different projects and only adjust the config file. 这样一来,我就可以将工作流重新用于不同的项目,并且仅调整配置文件。 Let me give you an example: 让我给你举个例子:

I have four samples that belong together and should be analyzed together. 我有四个属于同一样本,应该一起分析。 They are called sample1-4. 它们被称为sample1-4。 Every sample comes with some more information but to keep it simple here lets say its just a name tag such as S1, S2, etc. 每个样本都带有一些更多信息,但为了使其简单起见,我们可以说它只是一个名称标签,例如S1,S2等。

My config.yml file could look like this: 我的config.yml文件可能如下所示:

samples: ["sample1","sample2","sample3","sample4"]

sample1:
  tag: "S1"
sample2:
  tag: "S2"
sample3:
  tag: "S3"
sample4:
  tag: "S4"

And here is an example of the snakefile that we use: 这是我们使用的snakefile的示例:

configfile: "config.yaml"

rule final: 
  input: expand("{sample}.txt", sample=config["samples"])

rule rule1:
  output:  "{sample}.txt"
  params:  tag=config["{sample}"]["tag"]
  shell:   """
           touch {output}
           echo {params.tag} > {output}

What rule1 is trying to do is create a file named after each sample as saved in the samples variable in the config file. rule1试图做的是创建一个以每个样本命名的文件,保存在配置文件的samples变量中。 So far no problem. 到目前为止没有问题。 Then, I would like to print the sample tag into that file. 然后,我想将样本标签打印到该文件中。 As the code is written above, running snakemake will fail because config["{sample}"] will literally look for the {sample} variable in the config file which doesn't exist because instead I need it to be replaced with the current sample that the rule is run for, eg sample1 . 如上面的代码所示,运行snakemake将失败,因为config["{sample}"]会在配置文件中实际查找{sample}变量,因为该变量不存在,因为我需要将其替换为当前示例为其运行规则,例如sample1

Does anybody know if this is somehow possible to do, and if yes, how I could do it? 有人知道这样做是否可行,如果可以,我该怎么做?

Ideally I'd like to compress the information even more (see below) but that's further down the road. 理想情况下,我想进一步压缩信息(请参见下文),但这仍在继续。

samples:
    sample1:
        tag: "S1"
    sample2:
        tag: "S2"
    sample3:
        tag: "S3"
    sample4:
        tag: "S4"

I would suggest using a tab-delimited file in order to store samples information. 我建议使用制表符分隔的文件来存储样本信息。

sample.tab: sample.tab:

Sample     Tag      
1          S1   
2          S2

You could store the path to this file in the config file, and read it in your Snakefile. 您可以将该文件的路径存储在配置文件中,然后在您的Snakefile中读取它。

config.yaml: config.yaml:

sample_file: "sample.tab"

Snakefile: Snakefile:

configfile: "config.yaml"

sample_file = config["sample_file"]

samples = read_table(sample_file)['Sample']
tags    = read_table(sample_file)['Tag']

This way your can re-use your workflow for any number of samples, with any number of columns. 这样,您可以针对任何数量的样本和任意数量的列重用您的工作流程。

Apart from that, in Snakemake usually you can escape curly brackets by doubling them, maybe you could try that. 除此之外,在Snakemake中,通常您可以通过将花括号加倍来逃避花括号,也许您可​​以尝试这样做。

Good luck! 祝好运!

In the params section, you need to provide a function of wildcards . params部分中,您需要提供wildcards功能。 The following modification of your workflow seems to work: 您的工作流程的以下修改似乎可行:

configfile: "config.yaml"

rule final: 
    input: expand("{sample}.txt", sample=config["samples"])

rule rule1:
    output:
        "{sample}.txt"
    params:
        tag = lambda wildcards: config[wildcards.sample]["tag"]
    shell:
        """
        touch {output}
        echo {params.tag} > {output}
        """

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何编写 `.circleci/config.yml` 文件 - How to write a `.circleci/config.yml` file EB客户端会忽略.elasticbeanstalk文件夹中的config.yml文件 - EB client ignores config.yml file in .elasticbeanstalk folder Django中的_config.yml等效项 - _config.yml equivalent in Django 在 Databricks 上找不到 config.yml - config.yml not found on Databricks RASA X 路径“config.yml”不存在。 请确保使用默认位置 ('config.yml') 或使用 '--config' 指定它 - RASA X The path 'config.yml' does not exist. Please make sure to use the default location ('config.yml') or specify it with '--config' 通过弹性beantalk config.yml无法安装软件包 - Packages not installing though elastic beanstalk config.yml 如何处理snakemake的配置文件中提供的ftp链接? - How to handle ftp links provided in config file in snakemake? Python yml 配置文件 - Python yml config file 尝试运行“pytest --html=pytest_report.html”时,Circleci 不适用于我的 config.yml 文件,产生错误“没有这样的选项:--html” - Circleci does not work with my config.yml file when trying to run 'pytest --html=pytest_report.html', producing the error 'no such option: --html' CircleCI 和 Config.yml 的问题没有找到 requirements.txt 和 dev_requirements.txt - Issue with CircleCI and Config.yml not finding requirements.txt and dev_requirements.txt
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM