简体   繁体   中英

Invoking perl script with variable input and file output as arguments from python

I have a perl script that can be executed from the console as follows:

perl perlscript.pl -i input.txt -o output.txt --append

I want to execute this script from my python code. I figured out that subprocess.Popen can be used to connect to perl and I can pass my arguments with it. But, I also want to pass a variable (made by splitting up a text file) in place of input.txt. I have tried the following but it doesn't seem to work and gives an obvious TypeError in line 8:

import re, shlex, subprocess, StringIO
f=open('fulltext.txt','rb')
text= f.read()
l = re.split('\n\n',str(text))
intxt = StringIO.StringIO()
for i in range(len(l)):
    intxt.write(l[i])
    command_line='perl cnv_ltrfinder2gff.pl -i '+intxt+' -o output.gff --append'
    args=shlex.split(command_line)
    p = subprocess.Popen(args)

Is there any other work around for this?

EDIT: Here is a sample of the file fulltext.txt. Entries are separated by a line.

Predict protein Domains 0.021 second
>Sequence: seq1 Len:13143 [1] seq1 Len:13143 Location : 9 - 13124 Len: 13116 Strand:+ Score    : 6 [LTR region similarity:0.959] Status   : 11110110000 5'-LTR   : 9 - 501 Len: 493 3'-LTR   : 12633 - 13124 Len: 492 5'-TG    : TG , TG 3'-CA    : CA , CA TSR      : NOT FOUND Sharpness: 1,1 Strand + : PBS   : [14/20] 524 - 543 (LysTTT) PPT   : [12/15] 12553 - 12567

Predict protein Domains 0.019 second
>Sequence: seq5 Len:11539 [1] seq5 Len:11539 Location : 7 - 11535 Len: 11529 Strand:+ Score    : 6 [LTR region similarity:0.984] Status   : 11110110000 5'-LTR   : 7 - 506 Len: 500 3'-LTR   : 11036 - 11535 Len: 500 5'-TG    : TG , TG 3'-CA    : CA , CA TSR      : NOT FOUND Sharpness: 1,1 Strand + : PBS   : [15/22] 515 - 536 (LysTTT) PPT   : [11/15] 11020 - 11034

I want to separate them and pass each entry block to the perl script. All the files are in the same directory.

you might be interested in the os module and string formatting

Edit

I think I uderstand what you want now. correct me if I am wrong, but I think:

  • You want to split your fulltext.txt into blocks.
  • Every block contains a seq(number)
  • You want to run your perl script once for every block with as input file your seq(number)

if this is what you want, you could use the following code.

import os

in_file = 'fulltext.txt'
seq = []

with open(in_file,'r') as handle:
    lines = handle.readlines()
    for i in range(0,len(lines)):
        if lines[i].startswith(">"):
            seq.append(lines[i].rstrip().split(" ")[1])

for x in seq:
    command = "perl perl cnv_ltrfinder2gff.pl -i %s.txt -o output.txt --append"%x
    os.system(command)

The docs for --infile option :

Path of the input file. If an input file is not provided, the program will expect input from STDIN.

You could omit --infile and pass input via a pipe (stdin) instead:

#!/usr/bin/env python
from subprocess import Popen, PIPE

with open('fulltext.txt') as file: # read input data
    blocks = file.read().split('\n\n')

# run a separate perl process for each block
args = 'perl cnv_ltrfinder2gff.pl -o output.gff --append'.split()
for block in blocks:
    p = Popen(args, stdin=PIPE, universal_newlines=True)
    p.communicate(block)
    if p.returncode != 0:
        print('non-zero exit status: %s on block: %r' % (p.returncode, block))

You can run several perl scripts concurrently:

from multiprocessing.dummy import Pool # use threads

def run((i, block)):
    filename = 'out%03d.gff' % i
    args = ['perl', 'cnv_ltrfinder2gff.pl', '-o', filename]
    p = Popen(args, stdin=PIPE, universal_newlines=True, close_fds=True)
    p.communicate(block)
    return p.returncode, filename

exit_statuses, filenames = zip(*Pool().map(run, enumerate(blocks, start=1)))

It runs several (equal to the number of CPUs on your system) child processes in parallel. You could specify a different number of worker threads (pass to Pool() ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM