I am wondering if there is a way to define wildcards when the input files are named slightly differently. In this case FASTQ files have different suffixes - some end with '_L001_R1_001.fastq.gz' and some with 'R1_001.fastq.gz'. I'm hoping to use glob_wildcards to read in the run name and sample name. Is there a good way to use "or" in glob_wildcards? Any suggestions would be fantastic, thank you in advance!!
# Define samples:
RUNS, SAMPLES = glob_wildcards(config['fastq_dir'] + "{run}/{samp}" + config['fastq1_suffix'])
My config file contains the following:
fastq_dir:
'~/tb/data/'
fastq1_suffix:
'_L001_R1_001.fastq.gz'
fastq2_suffix:
'_L001_R2_001.fastq.gz'
First rule:
rule trim_reads:
input:
p1= config['fastq_dir'] + '{run}/{samp}' + config['fastq1_suffix'],
p2= config['fastq_dir'] + '{run}/{samp}' + config['fastq2_suffix']
One hack is to create a new wildcard, something like this:
RUNS, SAMPLES, SFX = glob_wildcards("dir/{run}/{samp}_L001{suffix}.fastq.gz")
Depending on the workflow, if SFX
is truly not needed, then it can be discarded with:
RUNS, SAMPLES, _ = ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.