简体   繁体   中英

Halt Snakemake rule while above rule finishes

I have a rule that requires a folder as an input. Problem is rule merge_fastqs uses merge folder to pull fastqs from different lanes into one big fastq per sample. But rule cellranger_count kicks off almost the same time as merged folder is created. Cellranger_count errors as there isnt anything in the folder. Can I use touch or some other method to hold Cellranger_count from proceeding until merge_fastqs is done?

rule all:
    input: completeflag


# use a function to identify input fastqs to circumvent barcodes irregularities
def input_fastq(wildcards):
    fnames = glob.glob(config['fq_glob']  %wildcards.sampleID) # the %s wildcard is already in the config string
    return sorted(fnames) # make sure R1 is first

# Rule used to get data together and named for CellRanger
rule merge_fastqs:
    input: input_fastq
    output:
        'merged/{sampleID}_S1_L001_R1_001.fastq.gz',
        'merged/{sampleID}_S1_L001_R2_001.fastq.gz'
    threads: 4
    params:
        r1 = config['pair_id'][0],
        r2 = config['pair_id'][1],
    run:
        r1 = [x for x in input if params.r1 in x]
        r2 = [x for x in input if params.r2 in x]
        shell('cat %s > {output[0]}' %' '.join(r1))
        shell('cat %s > {output[1]}' %' '.join(r2))


# make sure cell ranger module is loaded #module load cellranger/6.1.2
# all necessary tools need to be in scRNAseq reference folder
rule cellranger_count:
    input:
        'merged'
    output:
        maxtrix_h5 = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix.h5',
        metrics = '{sampleID}_TenXAnalysis/outs/metrics_summary.csv',
        dir = directory('{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix'),
        barcodes = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/barcodes.tsv.gz',
        features = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/features.tsv.gz',
        matrix = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/matrix.mtx.gz',
        html = '{sampleID}_TenXAnalysis/outs/web_summary.html',
    threads: 16
    params:
        # This needs to be fixed to a location
        ref = '/PATH/refdata-gex-GRCh38-2020-A',
        # Commented out for now
        #sample_id = '{sampleID}_merged'
    ## id = unique run ID string
    ## fastqs = Path to data
    ## sample = Sample names as specified in the sample sheet
    ## transcriptome = Path to Cell Ranger compatible transcritpome reference
    ## localcores = tells cellragner how many cores to use
    ## localmem = how much mem to use
    shell: """
    rm -rf {wildcards.sampleID}_TenXAnalysis

    cellranger count --id={wildcards.sampleID}_TenXAnalysis \
        --fastqs={input} \
        --sample={wildcards.sampleID} \
        --transcriptome={params.ref} \
        --localcores={threads} \
        --localmem=128
    """

One option is to wait for the output files of the previous rule to be created (rather than folder):

rule cellranger_count:
    input:
        folder='merged',
        files=rules.merge_fastqs.output
    # skipping details in the original rule
    shell:
        """
        cellranger count --id={wildcards.sampleID}_TenXAnalysis \
        --fastqs={input.folder} \
        # skipping details in the original rule
        """

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM