简体   繁体   中英

perl one-liner to capture multiple matches in the same line

I want to pull only the numbers out of a file and organized as CSV.

From:

  Aa:40, Bint:02 :  Bstring = 0x13   Ccc Num = 52   Dfloat = 164.0
  Aa:40, Bint:03 :  Bstring = 0x1B   Ccc Num = 10   Dfloat = 10.6
  Aa:41, Bint:04 :  Bstring = 0x1A   Ccc Num = 10   Dfloat = 1.6

to:

40,02,0x13,52,164.0
40,03,0x1B,10,10.6
41,04,0x1A,10,1.6

I can do this with Python re.findall (shown below)

for line in sys.stdin:
    print (",".join(re.findall(r'\d+.?\w+', line)))

What would be the perl way to achieve the same?

You are extracting from your strings numeric values.

The way you can do this is with:

 m/(\d+)/g;

Of course, since you're also including . and x :

 m/(\d[\d\.xA-F]+)/ig;

Or as a one liner:

perl -nle 'print join ",",  m/(\d[\d\.xA-F]+)/ig;' 
  • n is "wrap this in while ( <> ) { .

    This means you can pipe STDIN or specify a file after it - eg perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;' somefile perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;' somefile cat somefile | perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;' cat somefile | perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;'

  • l is auto-chomp. It chomps linefeeds and re-adds them after a print

  • e is execute this snippet.

Which effectively makes the above one liner:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    print join(',', /(\d[\d\.xA-F]+)/gi);
}

This gives;

40,02,0x13,52,164.0
40,03,0x1,10,10.6
41,04,0x1,10,1.6

Which looks like your desired output.

foo.pl - direct translation of python snippet

print join (',', m/(\d+.?\w+)/g), "\n" foreach <STDIN>;

The important thing to notice is the usage of /g when looking for matches. This flag will effectivelly say that we are interested in every match present in the string, and not just the first.

Of course, the one-liner (that you specifically asked for) can be written as the below, and it might be a little bit more readable to the untrained eye:

foreach my $line (<STDIN>) {
  my @data = $line =~ m/(\d+.?\w+)/g);
  print join (',', @data), "\n";
}


%              
Aa:40, Bint:02 :  Bstring = 0x13   Ccc Num = 52   Dfloat = 164.0
Aa:40, Bint:03 :  Bstring = 0x1B   Ccc Num = 10   Dfloat = 10.6
Aa:41, Bint:04 :  Bstring = 0x1A   Ccc Num = 10   Dfloat = 1.6
% 
40,02,0x13,52,164.0
40,03,0x1B,10,10.6
41,04,0x1A,10,1.6

Try something like this:

# Declare the regex
my $is_num = qr { 
                    (?: 0x[0-9a-fA-F]+ ) # Match stuff like 0x1B
                    |                    # Or
                    \d+ (?: \.\d+ )?     # 5 or 5.2
                }x; 

chomp(my @data = <DATA>);
for(@data){
   my @new;
   push @new, $1 while /($is_num)/g;
   $_ = join ",", @new;
}

print "$_\n" for @data;

__DATA__
Aa:40, Bint:02 :  Bstring = 0x13   Ccc Num = 52   Dfloat = 164.0
Aa:40, Bint:03 :  Bstring = 0x1B   Ccc Num = 10   Dfloat = 10.6
Aa:41, Bint:04 :  Bstring = 0x1A   Ccc Num = 10   Dfloat = 1.6

Output

40,02,0x13,52,164.0
40,03,0x1,10,10.6
41,04,0x1,10,1.6

I'm sure there are better ways to do it though. Thats the first that came to my mind

Another way

# Declare the regex
my $is_num = qr { 
                    (?: 0x[0-9a-fA-F]+ )  # Match stuff like 0x1B
                    |                     # Or
                    \d+ (?: \.\d+ )?      # 5 or 5.2
                }x;  


chomp(my @data = <DATA>);
for(@data){
   s/.*? ($is_num)/$1,/xg;
   s/\W+$//x;
}
print "$_\n" for @data;

Output is the same

40,02,0x13,52,164.0
40,03,0x1B,10,10.6
41,04,0x1A,10,1.6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM