简体   繁体   English

如何将一个 output 从一个进程传递到 Nextflow 中的另一个进程?

[英]How to pass the one output from one process to another process in Nextflow?

So I am new to nextflow, I am trying to pass the bam file produced by ALIGNMENT process to the MKDUP process but its throwing a null value error.所以我是 nextflow 的新手,我试图将ALIGNMENT进程生成的 bam 文件传递给MKDUP进程,但它抛出了 null 值错误。 I kind of understood the error, its due to the ${sample_id} in the MKDUP process.我有点理解这个错误,它是由于MKDUP过程中的${sample_id}造成的。 But I don't know how to do it.但我不知道该怎么做。


params.rawFiles = "/mnt/NGS1/WES_Analysis/test/*_{1,2}.fq.gz"
params.genome = "/mnt/NGS1/WES_Analysis/Database/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta"
params.outDir = "/mnt/NGS1/WES_Analysis/test/Output"

// println "reads: $params.rawFiles"

log.info """\
         genome: ${params.genome}
         rawFiles: ${params.rawFiles}
         Output Dir: ${params.outDir}
         """
         .stripIndent()



process FASTP {
    debug true

    publishDir "${params.outDir}/${sample_id}/Quality", mode:"copy"
    
    input:
    tuple val(sample_id), path(reads)

    output:
    tuple val(sample_id), path("${sample_id}_trim_*.fq.gz"), emit: reads
    path("${sample_id}.fastp.json"), emit: json
    path("${sample_id}.fastp.html"), emit: html

    script:
    """
    echo ${reads[0]}
    fastp --in1 ${reads[0]} --in2 ${reads[1]} \
    -q 20 -u 20 -l 40 --detect_adapter_for_pe \
    --out1 ${sample_id}_trim_1.fq.gz --out2 ${sample_id}_trim_2.fq.gz \
    -w 16 --json ${sample_id}.fastp.json --html ${sample_id}.fastp.html
    """
}


process ALIGNMENT {
    debug true

    publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"

    input: 
    tuple val(sample_id), path(reads)

    output:
    path("${sample_id}_alignSort.bam"), emit: alignSortBam
    path("${sample_id}_alignSort.bam.bai")



    script:
    """
    echo ${reads[0]}
    bwa mem -M -t 70 \
    ${params.genome} ${reads[0]} ${reads[1]} \
    -R "@RG\\tID:${sample_id}\\tSM:${sample_id}\\tPL:MGI\\tPU:Lane1\\tLB:MGI" | samtools view -S -b | \
    samtools sort -o ${sample_id}_alignSort.bam
    samtools index -@ 7 ${sample_id}_alignSort.bam ${sample_id}_alignSort.bam.bai
    echo The path of sample_alignSort.bam is: ${sample_id}_alignSort.bam
    """
}

process MKDUP {
    debug true

    publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"

    input: 
    tuple val(sample_id), path(alignSortBam)

    output:
    path("${sample_id}_alignSortMkDup.bam")

    script:
    """
    echo mkdup :- ${alignSortBam}
    gatk MarkDuplicatesSpark -OBI true \
    -I ${alignSortBam} \
    -O ${sample_id}_alignSortMkDup.bam \
    -M ${sample_id}_metrics.txt
    """
}

workflow{
    read_pairs_ch = Channel.fromFilePairs( params.rawFiles )

    FASTP( read_pairs_ch )
    ALIGNMENT(FASTP.out.reads)
    MKDUP(ALIGNMENT.out.alignSortBam)
}

Ideally what you want is to be able to match up the input and output declarations.理想情况下,您想要的是能够匹配输入和 output 声明。 You could use the map operator to transform the items in the channel.您可以使用map运算符来转换频道中的项目。 But the usual way is to just have the ALIGNMENT output declaration also produce a tuple , which would match the MKDUP input declaration.但通常的方法是让 ALIGNMENT output 声明也产生一个tuple ,它将匹配 MKDUP 输入声明。 It's also best to keep your BAM and index file together, rather than use a separate output channel.最好将 BAM 和索引文件放在一起,而不是使用单独的 output 频道。 For example:例如:

params.rawFiles = "/mnt/NGS1/WES_Analysis/test/*_{1,2}.fq.gz"
params.genome = "/mnt/NGS1/WES_Analysis/Database/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta"
params.outDir = "/mnt/NGS1/WES_Analysis/test/Output"
process FASTP {

    tag { sample_id }

    publishDir "${params.outDir}/${sample_id}/Quality", mode:"copy"
    cpus 16

    input:
    tuple val(sample_id), path(reads)

    output:
    tuple val(sample_id), path("${sample_id}_trim_{1,2}.fq.gz"), emit: reads
    path("${sample_id}.fastp.json"), emit: json
    path("${sample_id}.fastp.html"), emit: html

    script:
    def (r1, r2) = reads

    """
    fastp \\
        --in1 "${r1}" \\
        --in2 "${r2}" \\
        -q 20 \\
        -u 20 \\
        -l 40 \\
        --detect_adapter_for_pe \\
        --out1 "${sample_id}_trim_1.fq.gz" \\
        --out2 "${sample_id}_trim_2.fq.gz" \\
        -w ${task.cpus} \\
        --json "${sample_id}.fastp.json" \\
        --html "${sample_id}.fastp.html"
    """
}
process ALIGNMENT {

    tag { sample_id }

    publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"
    cpus 32

    input:
    tuple val(sample_id), path(reads)
    path bwa_index, stageAs: 'bwa_index/*'

    output:
    tuple val(sample_id), path("${sample_id}_alignSort.bam{,.bai}")

    script:
    def idxbase = bwa_index[0].baseName
    def (r1, r2) = reads

    """
    bwa mem \\
        -t ${task.cpus} \\
        -R "@RG\\tID:${sample_id}\\tSM:${sample_id}\\tPL:MGI\\tPU:Lane1\\tLB:MGI" \\
        -M \\
        "bwa_index/${idxbase}" \\
        "${r1}" \\
        "${r2}" |
    samtools view -S -b |
    samtools sort -o "${sample_id}_alignSort.bam"
    samtools index "${sample_id}_alignSort.bam"
    """
}
process MKDUP {

    tag { sample_id }

    publishDir "${params.outDir}/${sample_id}/BAM_Files", mode:"copy"

    input:
    tuple val(sample_id), path(indexed_bam)

    output:
    tuple val(sample_id), path("${sample_id}_alignSortMkDup.bam"), emit: bam
    path "${sample_id}_metrics.txt", emit: metrics

    script:
    """
    gatk MarkDuplicatesSpark -OBI true \\
        -I "${indexed_bam.first()}" \\
        -O "${sample_id}_alignSortMkDup.bam" \\
        -M "${sample_id}_metrics.txt"
    """
}
workflow {

    read_pairs_ch = Channel.fromFilePairs( params.rawFiles )
    bwa_index = Channel
        .fromPath( "${params.genome}.{amb,ann,bwt,pac,sa}" )
        .collect()

    FASTP( read_pairs_ch )
    ALIGNMENT( FASTP.out.reads, bwa_index )
    MKDUP( ALIGNMENT.out )
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM