简体   繁体   中英

Write an txt file with fastq pair names with python

I'm new to python and want to improve it. Now I want to write a python script to organize my fastq file names into a txt file, like this:

My files are like this:

d1_S10_L001_R1_001.fastq
d1_S10_L001_R2_001.fastq
d2_S11_L001_R1_001.fastq
d2_S11_L001_R2_001.fastq

What I want is to write a txt file like this:

d1 d1_S10_L001_R1_001.fastq d1_S10_L001_R2_001.fastq
d2 d2_S11_L001_R1_001.fastq d2_S11_L001_R2_001.fastq

This file contains: the strings before the first "_" followed by the fastq pairs. They are separated by "\\t".

I know this should be a very simple python task, but all I can do right now is:

import os


files = os.listdir(os.getcwd() + "/fastq")

with open("microbiome.files", "w") as myfile:
    for file in files:
        filename = file.split("_")[0]
        myfile.write(filename + "\t" + file + '\n')

This is obviously not doing the right job. It gives me:

d1 d1_S10_L001_R1_001.fastq 
d1 d1_S10_L001_R2_001.fastq
d2 d2_S11_L001_R1_001.fastq 
d2 d2_S11_L001_R2_001.fastq

How to correct this?

Thank you so much!

you need to sort the files first

files = sorted(os.listdir("fastq")) # normal sort should work fine

then you need to groupby the first part of the file name

import itertools
for groupID,groupItems in itertools.groupby(files,lambda x:x.split("_",1)[0]):
    my_file.write("{id} {names}\n".format(id=groupID,names=" ".join(groupItems)))

Collect it all as Joran suggested and I prefer to use glob (also helps if you might have some other file types in the directory as well):

import glob
files=glob.glob("*.fastq")
prefixes=set(f.split('_')[0] for f in files)
files_dict={p: [f for f in files if f.startswith(p)] for p in prefixes}
to_write='\n'.join(["{} {}".format(k, " ".join(v) for k, v in files_dict.items()] )
writer=open("microbiome.files", 'w')
writer.write(to_write)
writer.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM