简体   繁体   中英

Perl Regular expression to read square bracket

I would like read bit inside square bracket and also want the square bracket. The tricky part is class4. sample1[1] is not a bit. Bit only at the end of line.

Example:

File1.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Expectation result:

class1 bit = [1:2]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

I use regular expression, but square bracket cannot be read. [] = Used for set of characters. ... = Any character except newline. ref: https://www.geeksforgeeks.org/perl-regex-cheat-sheet/

My CODE:

my $file = "$File1.txt";
my $line;

open (FILE,"<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    my $line = $_;
    if ($line =~ m/[..]/){
        $line = $&;
    }
}
close (FILE);

Result only show: .........

I hope you guys can help me by giving idea. Thanks.

With your shown samples please try following regex in PCRE.

^([^-]*)->.*?(\[[^]]*\]);$

Here is the online demo for above regex.

Explanation: Adding detailed explanation for above regex.

^            ##Matching from starting of the value here.
(            ##Creating 1st capturing group here.
  [^-]*      ##Matching everything before very next occurrence of - here.
)            ##Closing capturing group here.
->           ##Matching literal -> here.
.*?          ##Using lazy match to match till next occurrence of [ mentioned below.
(            ##Creating 2nd capturing group here.
  \[[^]]*    ##matching literal [ following by very first occurrence of ] here.
  \]         ##Matching literal ] here.
)            ##Closing 2nd capturing group here.
;$           ##Mentioning literal ; at the end of the value here.

You could select the part that you want to remove, and replace with bit =

^[^-]*\K->.*(?=\[[^][]*\];$)

Explanation

  • ^ Start of string
  • [^-]*\K Match optional chars other than - and forget what is matches so far using \K
  • ->.* Match -> and the rest of the line
  • (?=\[[^][]*\];$) Positive lookahead, assert [...]; at the end of the line

See a regex demo and a Perl demo

Example

use strict;
use warnings;

while (<DATA>)
{
  s/^[^-]*\K->.*(?=\[[^][]*\];$)/ bit = /;
  print $_;
}

__DATA__
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Output

class1 bit = [4:4];
class2 bit = [2];
class3 bit = [7:3];
class4 bit = [7:3];

Or a bit more specific regex:

^class\d+\K->.*(?=\[[^][]*\];$)

See another regex demo .

[..] makes a character literal for matching the characters within the brackets, period in this case.

Since you are only matching literal periods, this is all you see.

This problem can be solved with a fairly simple regex.

Since you only want the last bracket, you can rely on the greadiness of .* to skip any brackets in the middle:

use strict;
use warnings;

my $file = "File1.txt"; 
my $line;

open (FILE, "<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    $line = $_;
    if( $line =~ /(class\d).*(\[[^\]]*\]);/ ){
        $line = "$1 bit = $2";
    }
}
close (FILE);

the regex /(class\d).*(\[[^\]]*\]);/ will match class followed by a digit, then the .* matches the rest of the line (hence it's greedy) and gives back enough to match (\[[^\]]*\]);

Using ^ as the first character in a character literal makes it match anything EXCEPT the characters within. To match literal [ you have to escape it like \[ .

(              # capture to $1 
    class\d    # match "class" followed by a digit
)              # end capture
.*             # match anything (greedy)
(              # capture to $2
    \[         # literal [
    [^ \] ]*   # match anything, except ] (greedy)
    \]         # literal ]
)              # end capture
;              # match ;

The parentheses will save what is matched within to the variables $1, $2, ... etc.

This can also be done with substitute, using the same regex and the /r flag to return the value:

while (<FILE>){
    $line = s/(class\d).*(\[[^\]]*\]);/$1 bit = $2/r;
}

Here's a simple command line one-liner that'll do the same:

perl -wlp -e 's/(class\d).*(\[[^\]]*\]);/$1 bit = $2/' File1.txt

change ' to " to run on windows

cat /tmp/a.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

sed -e 's/->.*\[/ bit = [/g' -e 's/;//g'  /tmp/a.txt
class1 bit = [4:4]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM