简体   繁体   中英

Using a match from a regex in another regex: skipping over metacharacters

I have a regular expression (REGEX 1) plus some Perl code that picks out a specific string of text, call it the START_POINT, from a large text document. This START_POINT is the beginning of a larger string of text that I want to extract from the large text document. I want to use another regular expression (REGEX 2) to extract from START_POINT to an END_POINT. I have a set of words to use in the regular expression (REGEX 2) which will easily find the END_POINT. Here is my problem. The START_POINT text string may contain metacharacters which will be interpreted differently by the regular expression. I don't know ahead of time which ones these will be. I am trying to process a large set of text documents and the START_POINT will vary from document to document. How do I tell the a regular expression to interpret a text string as just the text string and not as a text string with meta characters?

Perhaps this code will help this make more sense. $START_POINT was identified in code above this piece of code and is an extracted part of the large text string $TEXT.

my $END_POINT = "(STOP|CEASE|END|QUIT)";

my @NFS = $TEXT =~ m/(($START_POINT).*?($END_POINT))/misog;

I have tried to use the quotemeta function, but haven't had any success. It seems to destroy the integrity of the $START_POINT text string by adding in slashes which change the text.

So to summarize I am looking for some way to tell the regular expression to look for the exact string in $START_POINT without interpreting any of the string as a metacharacter while still maintaining the integrity of the string. Although I may be able to get the quotemeta to work, do you know of any other options available?

Thanks in advance for your help!

You need to convert the text to a regex pattern. That's what quotemeta does.

 my $start = '*';
 my $start_pat = quotemeta($start);  # * => \*
 /$start_pat/                        # Matches "*"

quotemeta can be accessed via \\Q..\\E :

 my $start = '*';
 /\Q$start_pat\E/                    # Matches "*"

Why reimplement quotemeta ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM