[英]Snakemake: How to save and access sample details in config.yml file?
Can anybody help me understand if it is possible to access sample details from a config.yml file when the sample names are not written in the snakemake workflow? 如果在蛇形工作流程中未编写样品名称,是否有人可以帮助我了解是否可以从config.yml文件访问样品详细信息? This is so I can re-use the workflow for different projects and only adjust the config file. 这样一来,我就可以将工作流重新用于不同的项目,并且仅调整配置文件。 Let me give you an example: 让我给你举个例子:
I have four samples that belong together and should be analyzed together. 我有四个属于同一样本,应该一起分析。 They are called sample1-4. 它们被称为sample1-4。 Every sample comes with some more information but to keep it simple here lets say its just a name tag such as S1, S2, etc. 每个样本都带有一些更多信息,但为了使其简单起见,我们可以说它只是一个名称标签,例如S1,S2等。
My config.yml file could look like this: 我的config.yml文件可能如下所示:
samples: ["sample1","sample2","sample3","sample4"]
sample1:
tag: "S1"
sample2:
tag: "S2"
sample3:
tag: "S3"
sample4:
tag: "S4"
And here is an example of the snakefile that we use: 这是我们使用的snakefile的示例:
configfile: "config.yaml"
rule final:
input: expand("{sample}.txt", sample=config["samples"])
rule rule1:
output: "{sample}.txt"
params: tag=config["{sample}"]["tag"]
shell: """
touch {output}
echo {params.tag} > {output}
What rule1 is trying to do is create a file named after each sample as saved in the samples
variable in the config file. rule1试图做的是创建一个以每个样本命名的文件,保存在配置文件的samples
变量中。 So far no problem. 到目前为止没有问题。 Then, I would like to print the sample tag into that file. 然后,我想将样本标签打印到该文件中。 As the code is written above, running snakemake
will fail because config["{sample}"]
will literally look for the {sample}
variable in the config file which doesn't exist because instead I need it to be replaced with the current sample that the rule is run for, eg sample1
. 如上面的代码所示,运行snakemake
将失败,因为config["{sample}"]
会在配置文件中实际查找{sample}
变量,因为该变量不存在,因为我需要将其替换为当前示例为其运行规则,例如sample1
。
Does anybody know if this is somehow possible to do, and if yes, how I could do it? 有人知道这样做是否可行,如果可以,我该怎么做?
Ideally I'd like to compress the information even more (see below) but that's further down the road. 理想情况下,我想进一步压缩信息(请参见下文),但这仍在继续。
samples:
sample1:
tag: "S1"
sample2:
tag: "S2"
sample3:
tag: "S3"
sample4:
tag: "S4"
I would suggest using a tab-delimited file in order to store samples information. 我建议使用制表符分隔的文件来存储样本信息。
sample.tab: sample.tab:
Sample Tag
1 S1
2 S2
You could store the path to this file in the config file, and read it in your Snakefile. 您可以将该文件的路径存储在配置文件中,然后在您的Snakefile中读取它。
config.yaml: config.yaml:
sample_file: "sample.tab"
Snakefile: Snakefile:
configfile: "config.yaml"
sample_file = config["sample_file"]
samples = read_table(sample_file)['Sample']
tags = read_table(sample_file)['Tag']
This way your can re-use your workflow for any number of samples, with any number of columns. 这样,您可以针对任何数量的样本和任意数量的列重用您的工作流程。
Apart from that, in Snakemake usually you can escape curly brackets by doubling them, maybe you could try that. 除此之外,在Snakemake中,通常您可以通过将花括号加倍来逃避花括号,也许您可以尝试这样做。
Good luck! 祝好运!
In the params
section, you need to provide a function of wildcards
. 在params
部分中,您需要提供wildcards
功能。 The following modification of your workflow seems to work: 您的工作流程的以下修改似乎可行:
configfile: "config.yaml"
rule final:
input: expand("{sample}.txt", sample=config["samples"])
rule rule1:
output:
"{sample}.txt"
params:
tag = lambda wildcards: config[wildcards.sample]["tag"]
shell:
"""
touch {output}
echo {params.tag} > {output}
"""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.