Snakemake：如何將每個調用列表中的一個整數用作腳本的輸入？

Question

我正在嘗試在 snakemake 中練習編寫工作流程。

我的 Snakefile 的內容：

configfile: "config.yaml"

rule get_col:
  input:
   expand("data/{file}.csv",file=config["datname"])
  output:
   expand("output/{file}_col{param}.csv",file=config["datname"],param=config["cols"])
  params:
   col=config["cols"]
  script:
   "scripts/getCols.R"

config.yaml 的內容：

cols:
  [2,4]
datname:
  "GSE3790_expression_data"

我的 R 腳本：

getCols=function(input,output,col) {
  dat=read.csv(input)
  dat=dat[,col]
  write.csv(dat,output,row.names=F)
}

getCols(snakemake@input[[1]],snakemake@output[[1]],snakemake@params[['col']])

似乎兩列都被同時調用。 我想要完成的是從每個輸出文件的列表中調用一列。

由於第二個輸出永遠沒有機會被創建（兩列都用於創建第一個輸出），snakemake 拋出一個錯誤：

Waiting at most 5 seconds for missing files.
MissingOutputException in line 3 of /Users/rebecca/Desktop/snakemake-tutorial/practice/Snakefile:
Job completed successfully, but some output files are missing.

在一個稍微不相關的注釋中，我認為我可以將輸入寫為：'"data/{file}.csv"' 但是返回：

WildcardError in line 4 of /Users/rebecca/Desktop/snakemake-tutorial/practice/Snakefile:
Wildcards in input files cannot be determined from output files:
'file'

任何幫助將非常感激！

Answer 1

看起來您想為每個文件運行 Rscript 兩次，對於col每個值運行一次。 在這種情況下，規則也需要被調用兩次。 在我看來，這里使用expand也有點太多了。 expand用所有可能的值填充您的通配符並返回結果文件的列表。 因此，此規則的輸出將是files和cols之間的所有可能組合，這是簡單腳本無法在一次運行中創建的。 這也是無法從輸出中推斷出file的原因 - 它在那里被擴展。

相反，嘗試僅針對一個文件和列更輕松地編寫規則，並在需要此輸出作為輸入的規則中擴展結果輸出。 如果您生成了工作流的最終輸出，請將其作為輸入放入rule all以告訴工作流最終目標是什么。

rule all:
  input:
    expand("output/{file}_col{param}.csv",
    file=config["datname"], param=config["cols"])

rule get_col:
  input:
    "data/{file}.csv"
  output:
    "output/{file}_col{param}.csv"
  params:
    col=lambda wc: wc.param
  script:
    "scripts/getCols.R"

Snakemake 將從rule all （或任何其他規則以進一步使用輸出）推斷需要做什么，並相應地調用rule get_col 。

Snakemake：如何將每個調用列表中的一個整數用作腳本的輸入？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-10-28 08:52:49

Snakemake：如何將每個調用列表中的一個整數用作腳本的輸入？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-10-28 08:52:49

解決方案1
1 已采納 2020-10-28 08:52:49