I am new to Perl and regex and I need to extract all the strings from a text file. A string is identified by anything that is wrapped by double quotes.
Example of string:
"This is string"
"1!=2"
"This is \"string\""
"string1"."string2"
"S
t
r
i
n
g"
The code:
my $fh;
open($fh,'<','text.txt') or die "$!";
undef $/;
my $text = <$fh>;
my @strings = m/".*"/g; # this returns the most out "" in example 4
my @strings2 = m/"[^"]*"/g #fixed the above issue but does not take in example 3
Edited : I want to get (1) a double quote, followed by (2) zero or more occurrences of either a non-double-quote-non-backslash or a backslash followed by any character, followed by (3) a double quote. (2) can be anything but "
The regex provided below m/"(?:\\.|[^"])*"/g however when the there is a line with "string1".string2."string2"
it will return "string1" string2 "string3"
Is there any wher to skip the previously matched word?
Can anyone please help?
One possible approach:
/"(?:\\.|[^"])*"/
... that reads as:
followed by any number of...
--- either any escaped character (any symbol prepended by \\
)
--- or any character that's not a double quotation mark
The key trick here is using alternation that'll eat any escaped symbol - including escaped double quotation mark.
Demo .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.