简体   繁体   中英

Run a perl script with Python on multiple files at once in a folder

This is my perl script at the moment:

#!/usr/bin/perl
use open qw/:std :utf8/;
use strict;
use warnings;

if (defined $ARGV[0]){
my $filename = $ARGV[0];
my %count;

open (my $fh, $filename) or die "Can't open '$filename' $!";
while (<$fh>)
{
        $count{ lc $1 }++ while /(\w+)/g;
}
close $fh;

my $array = 0;

foreach my $word ( sort { $count{$b} <=> $count{$a} } keys %count)
{
    print "$count{$word} $word\n" if $array++ < 10;
}

}else{
print "Please enter the name of the file: ";
my $filename = ($_ = <STDIN>);

my %count;

open (my $fh, $filename) or die "Can't open '$filename' $!";
while (<$fh>)
{
        $count{ lc $1 }++ while /(\w+)/g;
}
close $fh;

my $array = 0;

foreach my $word ( sort { $count{$b} <=> $count{$a} } keys %count)
{
    print "$count{$word} $word\n" if $array++ < 10;
}
}

And this is my Python script at the moment:

#!/usr/bin/env python3
import os

perlscript = "perl " + " perlscript.pl " + " /home/user/Desktop/data/*.txt " + " >> " + "/home/user/Desktop/results/output.txt"
os.system(perlscript)

Problem : When there are multiple txt-files in the data folder the script only runs on one file and ignores all the other txt-files. Is there a way to run the perlscript on all the txt-files at once?

Another problem: I'm also trying to delete the txt-files with the os.remove after they have been executed but they get deleted before the perlscript has a chance to execute.

Any ideas? :)

That Perl script processes one file. Also, that string passed to shell via os.system doesn't get expanded into a valid command with a file list as intended with the * shell glob.

Instead, build the file list in Python, using os.listdir or glob.glob or os.walk . Then iterate over the list and call that Perl script on each file, if it must process only one file at a time. Or, modify the Perl script to process multiple files and run it once with the whole list.

To keep the current Perl script and run it on each file

import os

data_path   = "/home/user/Desktop/data/"
output_path = "/home/user/Desktop/result/"

for file in os.listdir(data_path):
    if not file.endswith(".txt"):
        continue

    print("Processing " + file)                      # better use subprocess
    run_perlscript = "perl " + " perlscript.pl " + \
        data_path + file  + " >> " + output_path + "output.txt"
    os.system(run_perlscript)

The Perl script need be rewritten to lose that unneeded code duplication.

However, it is better to use subprocess module to run and manage external commands. This is advised even in the os.system documentation itself. For instance

import subprocess

with open(output_path + "output.txt", "a") as fout:
    for file in os.listdir(path):
        if not file.endswith(".txt"):
            continue 
        subprocess.run(["perl", "script.pl", data_path + file], stdout=fout)

where the file is opened in the append mode ( "a" ) following the question's >> redirection.

The recommended subprocess.run is available since python 3.5; otherwise use Popen .

Another, and arguably "right," option is to adjust the Perl script so that it can process multiple files. Then you only need run it once, with the whole file list.

use strict;
use warnings;
use feature 'say';    
use open ':std', ':encoding(UTF-8)';

foreach my $filename (@ARGV) {
    say "Processing $filename";

    my %count;

    open my $fh, '<', $filename  or do {
       warn "Can't open '$filename': $!";
       next;
    };
    while (<$fh>) {   
        $count{ lc $1 }++ while /(\w+)/g;
    }   
    close $fh;

    my $prn_cnt = 0;
    foreach my $word ( sort { $count{$b} <=> $count{$a} } keys %count) {   
        print "$count{$word} $word\n" if $prn_cnt++ < 10; 
    }   
}

This prints a warning on a file that it can't open and skips to the next one. If you'd rather have the script exit on any unexpected file replace or do { ... }; with the original die .

Then, and using glob.glob as an example now

import subprocess

data_path   = "/home/user/Desktop/data/"
output_path = "/home/user/Desktop/result/"

files = glob.glob(data_path + "*.txt")

with open(output_path + "output.txt", "a") as fout:
    subprocess.run(["perl", "script.pl", files], stdout=fout)

Since this passes the whole list as command arguments it assumes that there aren't (high) thousands of files, to exceed some length limits on pipes or command-line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM