简体   繁体   中英

Why does my python script delete itself?

I am working on a script that converts multiple fastq files into fasta and qual. Always when I run it, the script has zero bytes afterwards.

import sys
import re
import os
import fileinput
from Bio import SeqIO
from Bio.Alphabet import IUPAC

Directory = "/users/etc"
def process(Directory):
    filelist = os.listdir(Directory)
    for f in filelist:
        SeqIO.convert(f, "fastq", f.replace(".fastq",".qual"), "qual", alphabet=IUPAC.ambiguous_dna)

my_directory = "/users/etc"
process(my_directory)

I struggle with doing both fastq to fasta AND qual conversion at the same time - just copying the SeqIO.convert line and exchanging the file formats does not do the trick... Also, I would love to have a number printed of how many files have been converted.

Cheers

In this loop:

filelist = os.listdir(Directory)
for f in filelist:
    SeqIO.convert(f, "fastq", f.replace(".fastq",".qual"), "qual", alphabet=IUPAC.ambiguous_dna)

...you're looping over every file in your directory.

Not every file except your Python script, or every file that ends in .fastq , but every file .

Because 'yourscript.py'.replace('.fastq', '.qual') is still 'yourscript.py' , this then overwrites the Python script by trying to use it as output as well as input.


So, there are a few notes here:

  • Keep data and code separate. Ideally, in completely different directories. A $HOME/bin directory is an appropriate place to keep your own code -- if you add that directory to your PATH , then you can run executable commands in it from anywhere.
  • In your loop, filter out filenames that don't end in .fastq . That may look like:

     for f in filelist: if not f.endswith('.fastq'): continue SeqIO.convert(f, 'fastq', f[:-len('.fastq')]+'.qual', 'qual', alphabet=IUPAC.ambiguous_dna) 
  • Since after adding this check we know that .fastq exists at the end of a filename, we can thus be a little more efficient about replacing it -- instead of searching through the whole name for the string, we can just prune that many characters off the end, and then tack the new extension on instead. This is both a bit faster and means we don't modify any part of a filename except the extension.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM