perl one-liner to capture multiple matches in the same line

Question

I want to pull only the numbers out of a file and organized as CSV.

From:

  Aa:40, Bint:02 :  Bstring = 0x13   Ccc Num = 52   Dfloat = 164.0
  Aa:40, Bint:03 :  Bstring = 0x1B   Ccc Num = 10   Dfloat = 10.6
  Aa:41, Bint:04 :  Bstring = 0x1A   Ccc Num = 10   Dfloat = 1.6

to:

40,02,0x13,52,164.0
40,03,0x1B,10,10.6
41,04,0x1A,10,1.6

I can do this with Python re.findall (shown below)

for line in sys.stdin:
    print (",".join(re.findall(r'\d+.?\w+', line)))

What would be the perl way to achieve the same?

Answer 1

You are extracting from your strings numeric values.

The way you can do this is with:

 m/(\d+)/g;

Of course, since you're also including . and x :

 m/(\d[\d\.xA-F]+)/ig;

Or as a one liner:

perl -nle 'print join ",",  m/(\d[\d\.xA-F]+)/ig;'

n is "wrap this in while ( <> ) { .
This means you can pipe STDIN or specify a file after it - eg perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;' somefile perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;' somefile cat somefile | perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;' cat somefile | perl -nle 'print join ",", m/(\\d[\\d\\.xA-F]+)/gi;'
l is auto-chomp. It chomps linefeeds and re-adds them after a print
e is execute this snippet.

Which effectively makes the above one liner:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    print join(',', /(\d[\d\.xA-F]+)/gi);
}

This gives;

40,02,0x13,52,164.0
40,03,0x1,10,10.6
41,04,0x1,10,1.6

Which looks like your desired output.

Answer 2

_{foo.pl - direct translation of python snippet}

print join (',', m/(\d+.?\w+)/g), "\n" foreach <STDIN>;

The important thing to notice is the usage of /g when looking for matches. This flag will effectivelly say that we are interested in every match present in the string, and not just the first.

Of course, the one-liner (that you specifically asked for) can be written as the below, and it might be a little bit more readable to the untrained eye:

foreach my $line (<STDIN>) {
  my @data = $line =~ m/(\d+.?\w+)/g);
  print join (',', @data), "\n";
}

%              
Aa:40, Bint:02 :  Bstring = 0x13   Ccc Num = 52   Dfloat = 164.0
Aa:40, Bint:03 :  Bstring = 0x1B   Ccc Num = 10   Dfloat = 10.6
Aa:41, Bint:04 :  Bstring = 0x1A   Ccc Num = 10   Dfloat = 1.6
% 
40,02,0x13,52,164.0
40,03,0x1B,10,10.6
41,04,0x1A,10,1.6

Answer 3

Try something like this:

# Declare the regex
my $is_num = qr { 
                    (?: 0x[0-9a-fA-F]+ ) # Match stuff like 0x1B
                    |                    # Or
                    \d+ (?: \.\d+ )?     # 5 or 5.2
                }x; 

chomp(my @data = <DATA>);
for(@data){
   my @new;
   push @new, $1 while /($is_num)/g;
   $_ = join ",", @new;
}

print "$_\n" for @data;

__DATA__
Aa:40, Bint:02 :  Bstring = 0x13   Ccc Num = 52   Dfloat = 164.0
Aa:40, Bint:03 :  Bstring = 0x1B   Ccc Num = 10   Dfloat = 10.6
Aa:41, Bint:04 :  Bstring = 0x1A   Ccc Num = 10   Dfloat = 1.6

Output

40,02,0x13,52,164.0
40,03,0x1,10,10.6
41,04,0x1,10,1.6

I'm sure there are better ways to do it though. Thats the first that came to my mind

Answer 4

Another way

# Declare the regex
my $is_num = qr { 
                    (?: 0x[0-9a-fA-F]+ )  # Match stuff like 0x1B
                    |                     # Or
                    \d+ (?: \.\d+ )?      # 5 or 5.2
                }x;  


chomp(my @data = <DATA>);
for(@data){
   s/.*? ($is_num)/$1,/xg;
   s/\W+$//x;
}
print "$_\n" for @data;

Output is the same

40,02,0x13,52,164.0
40,03,0x1B,10,10.6
41,04,0x1A,10,1.6

perl one-liner to capture multiple matches in the same line

Question

4 answers

solution1
4 ACCPTED 2015-10-09 08:24:42

solution2
3 2015-10-09 08:19:31

solution3
1 2015-10-09 02:53:49

Output

solution4
0 2015-10-09 03:17:21

Another way

Output is the same

perl one-liner to capture multiple matches in the same line

Question

4 answers

solution1 4 ACCPTED 2015-10-09 08:24:42

solution2 3 2015-10-09 08:19:31

solution3 1 2015-10-09 02:53:49

Output

solution4 0 2015-10-09 03:17:21

Another way

Output is the same

solution1
4 ACCPTED 2015-10-09 08:24:42

solution2
3 2015-10-09 08:19:31

solution3
1 2015-10-09 02:53:49

solution4
0 2015-10-09 03:17:21