简体   繁体   中英

Snakemake glob_wildcards with multiple input suffixes

I am wondering if there is a way to define wildcards when the input files are named slightly differently. In this case FASTQ files have different suffixes - some end with '_L001_R1_001.fastq.gz' and some with 'R1_001.fastq.gz'. I'm hoping to use glob_wildcards to read in the run name and sample name. Is there a good way to use "or" in glob_wildcards? Any suggestions would be fantastic, thank you in advance!!

# Define samples: 
RUNS, SAMPLES = glob_wildcards(config['fastq_dir'] + "{run}/{samp}" + config['fastq1_suffix'])

My config file contains the following:

fastq_dir: 
    '~/tb/data/'
fastq1_suffix:
    '_L001_R1_001.fastq.gz'
fastq2_suffix:  
    '_L001_R2_001.fastq.gz'

First rule:

rule trim_reads:  
  input: 
    p1= config['fastq_dir'] + '{run}/{samp}' + config['fastq1_suffix'], 
    p2= config['fastq_dir'] + '{run}/{samp}' + config['fastq2_suffix']

One hack is to create a new wildcard, something like this:

RUNS, SAMPLES, SFX = glob_wildcards("dir/{run}/{samp}_L001{suffix}.fastq.gz")

Depending on the workflow, if SFX is truly not needed, then it can be discarded with:

RUNS, SAMPLES, _ = ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM