简体   繁体   English

从Python调用带有可变输入和文件输出作为参数的Perl脚本

[英]Invoking perl script with variable input and file output as arguments from python

I have a perl script that can be executed from the console as follows: 我有一个可以从控制台执行的perl脚本,如下所示:

perl perlscript.pl -i input.txt -o output.txt --append

I want to execute this script from my python code. 我想从我的python代码执行此脚本。 I figured out that subprocess.Popen can be used to connect to perl and I can pass my arguments with it. 我发现subprocess.Popen可以用于连接到perl,并且可以通过它传递参数。 But, I also want to pass a variable (made by splitting up a text file) in place of input.txt. 但是,我也想传递一个变量(通过分割文本文件制成)来代替input.txt。 I have tried the following but it doesn't seem to work and gives an obvious TypeError in line 8: 我已经尝试了以下方法,但是它似乎不起作用,并在第8行给出了明显的TypeError:

import re, shlex, subprocess, StringIO
f=open('fulltext.txt','rb')
text= f.read()
l = re.split('\n\n',str(text))
intxt = StringIO.StringIO()
for i in range(len(l)):
    intxt.write(l[i])
    command_line='perl cnv_ltrfinder2gff.pl -i '+intxt+' -o output.gff --append'
    args=shlex.split(command_line)
    p = subprocess.Popen(args)

Is there any other work around for this? 是否有任何其他变通方法吗?

EDIT: Here is a sample of the file fulltext.txt. 编辑:这是文件fulltext.txt的示例。 Entries are separated by a line. 条目之间用一行分隔。

Predict protein Domains 0.021 second
>Sequence: seq1 Len:13143 [1] seq1 Len:13143 Location : 9 - 13124 Len: 13116 Strand:+ Score    : 6 [LTR region similarity:0.959] Status   : 11110110000 5'-LTR   : 9 - 501 Len: 493 3'-LTR   : 12633 - 13124 Len: 492 5'-TG    : TG , TG 3'-CA    : CA , CA TSR      : NOT FOUND Sharpness: 1,1 Strand + : PBS   : [14/20] 524 - 543 (LysTTT) PPT   : [12/15] 12553 - 12567

Predict protein Domains 0.019 second
>Sequence: seq5 Len:11539 [1] seq5 Len:11539 Location : 7 - 11535 Len: 11529 Strand:+ Score    : 6 [LTR region similarity:0.984] Status   : 11110110000 5'-LTR   : 7 - 506 Len: 500 3'-LTR   : 11036 - 11535 Len: 500 5'-TG    : TG , TG 3'-CA    : CA , CA TSR      : NOT FOUND Sharpness: 1,1 Strand + : PBS   : [15/22] 515 - 536 (LysTTT) PPT   : [11/15] 11020 - 11034

I want to separate them and pass each entry block to the perl script. 我想将它们分开,并将每个入口块传递给perl脚本。 All the files are in the same directory. 所有文件都在同一目录中。

you might be interested in the os module and string formatting 您可能对os模块字符串格式感兴趣

Edit 编辑

I think I uderstand what you want now. 我想我明白你现在想要什么。 correct me if I am wrong, but I think: 如果我错了,请纠正我,但我认为:

  • You want to split your fulltext.txt into blocks. 您想将fulltext.txt拆分为块。
  • Every block contains a seq(number) 每个块包含一个seq(number)
  • You want to run your perl script once for every block with as input file your seq(number) 您想为每个块运行一次perl脚本,并将seq(number)作为输入文件

if this is what you want, you could use the following code. 如果这是您想要的,则可以使用以下代码。

import os

in_file = 'fulltext.txt'
seq = []

with open(in_file,'r') as handle:
    lines = handle.readlines()
    for i in range(0,len(lines)):
        if lines[i].startswith(">"):
            seq.append(lines[i].rstrip().split(" ")[1])

for x in seq:
    command = "perl perl cnv_ltrfinder2gff.pl -i %s.txt -o output.txt --append"%x
    os.system(command)

The docs for --infile option : --infile选项的文档

Path of the input file. 输入文件的路径。 If an input file is not provided, the program will expect input from STDIN. 如果未提供输入文件,则程序将期望来自STDIN的输入。

You could omit --infile and pass input via a pipe (stdin) instead: 您可以省略--infile并通过管道(stdin)传递输入:

#!/usr/bin/env python
from subprocess import Popen, PIPE

with open('fulltext.txt') as file: # read input data
    blocks = file.read().split('\n\n')

# run a separate perl process for each block
args = 'perl cnv_ltrfinder2gff.pl -o output.gff --append'.split()
for block in blocks:
    p = Popen(args, stdin=PIPE, universal_newlines=True)
    p.communicate(block)
    if p.returncode != 0:
        print('non-zero exit status: %s on block: %r' % (p.returncode, block))

You can run several perl scripts concurrently: 您可以同时运行多个perl脚本:

from multiprocessing.dummy import Pool # use threads

def run((i, block)):
    filename = 'out%03d.gff' % i
    args = ['perl', 'cnv_ltrfinder2gff.pl', '-o', filename]
    p = Popen(args, stdin=PIPE, universal_newlines=True, close_fds=True)
    p.communicate(block)
    return p.returncode, filename

exit_statuses, filenames = zip(*Pool().map(run, enumerate(blocks, start=1)))

It runs several (equal to the number of CPUs on your system) child processes in parallel. 它并行运行多个(等于系统上的CPU数量)子进程。 You could specify a different number of worker threads (pass to Pool() ). 您可以指定其他数量的工作线程(传递给Pool() )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM