Using python subprocess.call for writing count of fasta sequences to file

Question

I have over 14000 fasta files, and I want to keep only the ones containing 5 sequences. I know I can use the following bash command to obtain the number of sequences in a single fasta file:

grep -c "^>" filename.fasta

So my approach was to write the the filename and count of sequences in each file to a text file, which I could then use to isolate only the sequences I want. To run the grep command on so many files, I am using subprocess.call:

import subprocess
import os


with open("five_seqs.txt", "w") as f:
    for file in os.listdir("/Users/vivaksoni1/Downloads/DA_CDS/fasta_files"):
        f.write(file),
        subprocess.call(["grep", "-c", "^>", file], stdout = f)

Part of my problem is that the grep command is "^>", but subprocess requires each argument to have its own quotation marks. How can I use "^>" when I would essentially be entering as an argument: ""^>"".

Also, do I have to add f.write("\\n") after f.write(file)? Currently my output is just a text file with each entry next to one another, and the subprocess command just prints each file name to the terminal and states no file found as such:

grep: MZ23900789.fasta: No such file or directory

Answer 1

Try the following code, it should work for your example. It will write the filename plus a tab separator and the number of sequences (ie > characters). Using Popen and communicate gives better flexibility in handling the output. Tested on Ubuntu.

import subprocess
import os

fasta_dir = "/Users/vivaksoni1/Downloads/DA_CDS/fasta_files/"

with open("five_seqs.txt", "w") as f:
    for file in os.listdir(fasta_dir):
        f.write(file + '\t')
        grep = subprocess.Popen(["grep", "-c", "^>", fasta_dir + file], stdout = subprocess.PIPE)
        out, err = grep.communicate()
        f.write(out + '\n')

Using python subprocess.call for writing count of fasta sequences to file

Question

1 answers

solution1
2 ACCPTED 2016-04-30 18:16:56

Using python subprocess.call for writing count of fasta sequences to file

Question

1 answers

solution1 2 ACCPTED 2016-04-30 18:16:56

solution1
2 ACCPTED 2016-04-30 18:16:56