简体   繁体   中英

Perl's Regular Expression in a Tibetan Script

I am trying to remove/delete the second last character of a Tibetan script, as shown below (character in following example are of English):

$char = "ti.be.tan.|";           

So I want to remove the "second last" character "." I tried in following way with my few knowledge of RE:

$char =~ s/.|$/|/g;
$char =~ s/[.|]$/|/g;
$char = tr/.|//d;       # and later add |.

What am I doing wrong?

Before I tell you what you need to do right, let's look at what you're doing wrong:

$char =~ s/.|$/|/g;

The problem here is that both . and | are metacharacters in regular expressions. The | means "or", so you're saying "match . or $ ". You correctly know that $ means the end of the string, but . means "any one character." So it immediately matches one character, and continues to immediately match one character, each time changing that character to | (metacharacters don't apply in the second half of the s/// expression), then it matches the end of the string and adds a | in there. Or something like that. Basically, not what you want to happen.

$char =~ s/[.|]$/|/g;

Well, inside [] s, . and | stop being metacharacters, but [] means "one of these," so this regular expression looks for the character before the end of the string, and if it's either | or . , it changes it to | . Again, not what you want to happen.

$char = tr/.|//d;       # and later add |.

tr is the wrong tool for this job. This would delete all . and | characters in your string, expect that you're not using the =~ regex match operator, but the = assignment operator. Definitely not what you want to happen.

What you want is this:

$char =~ s/\.\|$/|/;

We've escaped both the . and the | with a \\ so Perl knows "the character after the \\ is a literal character with no special meaning*" and matches a literal .| at the end of your string and replaces it with just | .

That said, it sounds like you're kind of new to regular expressions. I'm a big fan of perldoc perlretut , which I think is one of the best (if not the best) introduction to regular expressions in Perl. You should really read it - regexes are a powerful tool in the hands of those who know them, and a powerful headache to those who don't.

Chris Lutz has already provide an excellent answer so I just want to provide additional answer in case you want to remove second last character of other kind of string.

Here it is:

$char =~ s/(.)(.)$/\2/g;

Basicaly, Perl (actally RegEx) will map everything between '(' and ')' to groups. Which you can manipulate that group later. From this code the gourps are.

$char =~ s/(.)(.)$/\2/g;
#          ^-^^-^  ^^
#  Capture G1 G2   ++-- Then replace it with only group 2

So in this case, Perl goes from the first character, since it was not match any, it let go (not replace), when it find a match it replace the match with what you specified (in this case is group#2).

Hope this helps.

You could also use substr as an lvalue in this situation:

$char = "ti.be.tan.|";
substr($char,-2,1) = "";
print $char;               # ===>  ti.be.tan|

There's also the method using positive lookahead assertion to remove the second last character.

$char ~= s/.(?:.$)//;

Which essentially reads substitute "" for any character which is immediately followed by a single character and the end of the string.

If the second last character is always a specific character you can replace the first . Remember to escape RE metacharacters ()[]/.*?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM