简体   繁体   中英

How do I execute an AWK statement in a Perl script which uses a Perl variable

I'm trying to use an awk statement inside a perl script which takes user input and searches though a number of text files to find lines which match all the words in the input in any order. To that end, I'm able to do the Awk search I want on the CLI thusly:

awk 'tolower($0) ~ / 204/ && / test/ && / leg/' *_Codes.txt

This will return lines in the referenced text files which include words starting with '204', 'test' and 'leg', such as 'left legs being tested in room 2045';

When I try to do this in a Perl script, however, setting the user input to to a variable and modifying it to include the && operators and slashes, I'm not getting anything back. Here's what I have:

my ($code_search, $code_set) = @_;

# Clean the input for awk
# trim whitespace from the ends
$code_search =~ s!(^\s+|\s+$)!!g;

# separate words with the && operator and slashes
$code_search =~ s!\s+!/ && / !g;

# make input lower case and tack on front and back slashes 
my $sanitized_query = lc "/ ${code_search}/";

# at this point, a user input of '204 leg test'
# is transformed to '/ 204/ && / leg/ && / test/'
# and is saved to the $sanitized_query variable

# run the query through awk and save it to $results
my $results = `awk 'tolower($0) ~ \$sanitized_query' *_Codes.txt`;

But $results doesn't give me anything.

Maybe awk isn't the right tool for the job here, but it seems better suited for my needs than grep is, as I want to make sure I can search for all the terms entered and return results where they all appear in a line of text in any order.

Any assistance is much appreciated.

Why not do it entirely in perl, rather than using awk? You should be able open the file, read in each line and print it out if the regex matches. Regular expressions are one of perls best strengths, why not take advantage of them directly rather than trying to call awk?

The only advantage I see to using awk is that you would have to manually list all of the *_Codes.txt files, but that shouldn't be too difficult in perl.

The easiest way to do it in perl, assuming you have a line of text, is simply to run the regex 3 times, one for each portion you're trying to match. For exmaple, if you want to match 204 , test and leg you should be able to do

if (($line =~ m/ 204/i) && ($line =~ m/ test/i) && ($line =~ m/ leg/i)){
    print $line;
}

$0 is a valid symbol in Perl, too (it contains the name of the currently running Perl script) and is also interpolated inside backquotes. You need to escape it, too:

my $results = `awk 'tolower(\$0) ~ \$sanitized_query' *_Codes.txt`;

Pure Perl solution, including splitting $code_search , globbing the filenames, and matching the patterns only at the beginnings of words:

use List::MoreUtils qw{ all };

my @words = ($code_search =~ m/\S+/g);

for my $fn (glob('*_Codes.txt')) {
    open my $f, '<', $fn || die "Can't open: $!";

    while (defined(my $line = <$f>)) {
        if (all { $line =~ m{\b\Q$_\E}is } @words) { print $line }
    }

    close $f;
}

If you don't want to depend on List::MoreUtils, then change the 'if' to:

        if (!grep { $line !~ m{\b\Q$_\E}is } @words) { print $line }

— a bit harder to read, but only uses perl builtins.

To build upon what @mob said, I think this is an escaping issue. He's escaping too much, though. What you need is something like this:

my $results = `awk 'tolower(\$0) ~ $sanitized_query' *_Codes.txt`;

You want $0 to be literal, but $sanitized_query to be interpolated. (In your code example above, you're escaping the wrong one).

While Skolor's answer is completely appropriate, here's a slightly different approach, making use of the smart match operator (which is available in Perl version 5.10 or greater). If the lines of your text file are really long and if you don't have a lot of words to check against the lines, this might be a quicker approach (emphasis on "might").

use strict;
use warnings;

my @query_words=qw(204 test leg);

open(my $read,"<","input_file") or die $!;

while(<$read>)
{
  chomp; #get rid of trailing newline
  my @words=split(/\s+/,$_); #split on spaces to get actual words

  foreach my $q (@query_words)
  {
    if($q~~@words) #If we have a match, print and exit the loop.
    {
      print "$_\n";
      last;
    }
  }
}

close($read);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM