简体   繁体   中英

perl regrex that captures substring between tic marks

I am trying to find a solution in perl that captures the filename in the following string -- between the tic marks.

my $str = "Saving to: ‘wapenc?T=mavodi-7-13b-2b-3-96-1e3431a’";

(my $results) = $str =~ /‘(.*?[^\\])‘/;
print $results if $results;

I need to end up with wapenc?T=mavodi-7-13b-2b-3-96-1e3431a

The final tick seems to be different in your regex than in the input string - char 8217 (RIGHT SINGLE QUOTATION MARK U+2019) versus 8216 (LEFT SINGLE QUOTATION MARK U+2018). Also, when using Unicode characters in the source, be sure to include

use utf8;

and save the file UTF-8 encoded.

After fixing these two issues, the code worked for me:

#! /usr/bin/perl
use warnings;
use strict;
use utf8;

my $str = "Saving to: ‘wapenc?T=mavodi-7-13b-2b-3-96-1e3431a’";

(my $results) = $str =~ /‘(.*?[^\\])’/;
print $results if $results;

Your tic characters aren't in the 7-bit ASCII character set, so there is a whole character-encoding rabbit hole to go down here. But the quick and dirty solution is to capture everything in between extended characters.

($result) = $str =~ /[^\0-\x7f]+(.*?)[^\0-\x7f]/;

[^\\0-\\x7f] matches characters with character values not between 0 and 127, ie, anything that is not a 7-bit ASCII character including new lines, tabs, and other control sequences. This regular expression will work whether your input is UTF-8 encoded or has already been decoded, and may work for other character encodings, too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM