[英]How to pass the one output from one process to another process in Nextflow?
So I am new to nextflow, I am trying to pass the bam file produced by ALIGNMENT
process to the MKDUP
process but its throwing a null value error.所以我是 nextflow 的新手,我试图将
ALIGNMENT
进程生成的 bam 文件传递给MKDUP
进程,但它抛出了 null 值错误。 I kind of understood the error, its due to the ${sample_id}
in the MKDUP
process.我有点理解这个错误,它是由于
MKDUP
过程中的${sample_id}
造成的。 But I don't know how to do it.但我不知道该怎么做。
params.rawFiles = "/mnt/NGS1/WES_Analysis/test/*_{1,2}.fq.gz"
params.genome = "/mnt/NGS1/WES_Analysis/Database/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta"
params.outDir = "/mnt/NGS1/WES_Analysis/test/Output"
// println "reads: $params.rawFiles"
log.info """\
genome: ${params.genome}
rawFiles: ${params.rawFiles}
Output Dir: ${params.outDir}
"""
.stripIndent()
process FASTP {
debug true
publishDir "${params.outDir}/${sample_id}/Quality", mode:"copy"
input:
tuple val(sample_id), path(reads)
output:
tuple val(sample_id), path("${sample_id}_trim_*.fq.gz"), emit: reads
path("${sample_id}.fastp.json"), emit: json
path("${sample_id}.fastp.html"), emit: html
script:
"""
echo ${reads[0]}
fastp --in1 ${reads[0]} --in2 ${reads[1]} \
-q 20 -u 20 -l 40 --detect_adapter_for_pe \
--out1 ${sample_id}_trim_1.fq.gz --out2 ${sample_id}_trim_2.fq.gz \
-w 16 --json ${sample_id}.fastp.json --html ${sample_id}.fastp.html
"""
}
process ALIGNMENT {
debug true
publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"
input:
tuple val(sample_id), path(reads)
output:
path("${sample_id}_alignSort.bam"), emit: alignSortBam
path("${sample_id}_alignSort.bam.bai")
script:
"""
echo ${reads[0]}
bwa mem -M -t 70 \
${params.genome} ${reads[0]} ${reads[1]} \
-R "@RG\\tID:${sample_id}\\tSM:${sample_id}\\tPL:MGI\\tPU:Lane1\\tLB:MGI" | samtools view -S -b | \
samtools sort -o ${sample_id}_alignSort.bam
samtools index -@ 7 ${sample_id}_alignSort.bam ${sample_id}_alignSort.bam.bai
echo The path of sample_alignSort.bam is: ${sample_id}_alignSort.bam
"""
}
process MKDUP {
debug true
publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"
input:
tuple val(sample_id), path(alignSortBam)
output:
path("${sample_id}_alignSortMkDup.bam")
script:
"""
echo mkdup :- ${alignSortBam}
gatk MarkDuplicatesSpark -OBI true \
-I ${alignSortBam} \
-O ${sample_id}_alignSortMkDup.bam \
-M ${sample_id}_metrics.txt
"""
}
workflow{
read_pairs_ch = Channel.fromFilePairs( params.rawFiles )
FASTP( read_pairs_ch )
ALIGNMENT(FASTP.out.reads)
MKDUP(ALIGNMENT.out.alignSortBam)
}
Ideally what you want is to be able to match up the input and output declarations.理想情况下,您想要的是能够匹配输入和 output 声明。 You could use the map operator to transform the items in the channel.
您可以使用map运算符来转换频道中的项目。 But the usual way is to just have the ALIGNMENT output declaration also produce a tuple , which would match the MKDUP input declaration.
但通常的方法是让 ALIGNMENT output 声明也产生一个tuple ,它将匹配 MKDUP 输入声明。 It's also best to keep your BAM and index file together, rather than use a separate output channel.
最好将 BAM 和索引文件放在一起,而不是使用单独的 output 频道。 For example:
例如:
params.rawFiles = "/mnt/NGS1/WES_Analysis/test/*_{1,2}.fq.gz"
params.genome = "/mnt/NGS1/WES_Analysis/Database/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta"
params.outDir = "/mnt/NGS1/WES_Analysis/test/Output"
process FASTP {
tag { sample_id }
publishDir "${params.outDir}/${sample_id}/Quality", mode:"copy"
cpus 16
input:
tuple val(sample_id), path(reads)
output:
tuple val(sample_id), path("${sample_id}_trim_{1,2}.fq.gz"), emit: reads
path("${sample_id}.fastp.json"), emit: json
path("${sample_id}.fastp.html"), emit: html
script:
def (r1, r2) = reads
"""
fastp \\
--in1 "${r1}" \\
--in2 "${r2}" \\
-q 20 \\
-u 20 \\
-l 40 \\
--detect_adapter_for_pe \\
--out1 "${sample_id}_trim_1.fq.gz" \\
--out2 "${sample_id}_trim_2.fq.gz" \\
-w ${task.cpus} \\
--json "${sample_id}.fastp.json" \\
--html "${sample_id}.fastp.html"
"""
}
process ALIGNMENT {
tag { sample_id }
publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"
cpus 32
input:
tuple val(sample_id), path(reads)
path bwa_index, stageAs: 'bwa_index/*'
output:
tuple val(sample_id), path("${sample_id}_alignSort.bam{,.bai}")
script:
def idxbase = bwa_index[0].baseName
def (r1, r2) = reads
"""
bwa mem \\
-t ${task.cpus} \\
-R "@RG\\tID:${sample_id}\\tSM:${sample_id}\\tPL:MGI\\tPU:Lane1\\tLB:MGI" \\
-M \\
"bwa_index/${idxbase}" \\
"${r1}" \\
"${r2}" |
samtools view -S -b |
samtools sort -o "${sample_id}_alignSort.bam"
samtools index "${sample_id}_alignSort.bam"
"""
}
process MKDUP {
tag { sample_id }
publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"
input:
tuple val(sample_id), path(indexed_bam)
output:
tuple val(sample_id), path("${sample_id}_alignSortMkDup.bam"), emit: bam
path "${sample_id}_metrics.txt", emit: metrics
script:
"""
gatk MarkDuplicatesSpark -OBI true \\
-I "${indexed_bam.first()}" \\
-O "${sample_id}_alignSortMkDup.bam" \\
-M "${sample_id}_metrics.txt"
"""
}
workflow {
read_pairs_ch = Channel.fromFilePairs( params.rawFiles )
bwa_index = Channel
.fromPath( "${params.genome}.{amb,ann,bwt,pac,sa}" )
.collect()
FASTP( read_pairs_ch )
ALIGNMENT( FASTP.out.reads, bwa_index )
MKDUP( ALIGNMENT.out )
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.