I'm trying to find both single and multi-line comments in an HTML file. I've stripped it down to just a few examples, and some other content just to have something there.
I've read a lot of the entries here but can't get a definitive answer to this. I'm reading in the HTML file in "slurp" mode, and doing a match of my pattern. This code runs now and prints only the first match.
#!C:\Perl\bin\perl.exe
BEGIN { unshift @INC, 'C:\rmhperl'; }
use warnings;
no warnings 'uninitialized';
chdir 'c:\watts\html';
open FILE, "test.html" or print 'error opening file "test.html" ';
my $text = do { local $/; <FILE> };
close(FILE);
if ($text =~ m/(?s)(<!--.*?)(-->\n)/sg) {
print "1 = $1 2= $2\n";
}
exit;
I've set up single and multi-line comments in the HTML file. I can get one or the other printed but not both (at least in "slurp" mode).
I'm told I should be able to accomplish this with a single regex, so the objective is "find all HTML comments, regardless of their being single/multi-line comments" .
I built the regex to find both, but finds only the first match -- a multi-line comment.
I'm trying to find a way to find every match, whether it occurs on one line or multiple lines. I can find one or the other, but I can't get them to work with one regex.
I can do non-slurp mode, and find the <!--
tag, then loop until I see the -->
tag, but wanted to see if I can get it to work with a single regex.
I've been reading about this, and trying to find relevant examples. can't see what I'm missing. Here's the HTML file snippet I have been using for the regex:
<!DOCTYPE html>
<script type="text/javascript" src="fadeslideshow.js"></script>
<style>
.divTable {
display: block;
width: 100%;
}
.divTableBody, .divTableRow{ clear: both; }
.divTableCell {
border: 1px solid #999999;
float: left;
overflow: hide;
padding: 2%;
width: 45%; }
.divTable:after {
display: block;
font-size: 0;
content: " ";
clear: both;
height: 100px; }
</style>
<style type="text/css">
<!--
a:link {color: #0000ff;}
a:visited {color: #3563a8;}
a:active {color: #000000;}
a:hover {background-color: #000000;}
a {text-decoration: none;}
-->
</style>
</head>
<body class="home">
<div id="white_back">
<div style="text-align: center">
</div>
<div class="chromestyle" id="chromemenu">
<ul>
<!-- <li><a href="xyz.com">Home</a></li>
-->
<li><a href="#" rel="dropmenu0">About Us</a></li>
<li><a href="#" rel="dropmenu5">Publications</a></li>
</ul>
</div>
<!--1st drop down menu
-->
<div id="dropmenu0" class="dropmenudiv">
</div>
<!--2nd drop down menu -->
<div id="dropmenu1" class="dropmenudiv">
</div>
I presume this is production code, in which case your manager is a scary man as this sort of practice can result in hard-to-find bugs. That's acceptable if the code is only for yourself, but inflicting that on others is unfair
Some notes on your code
The shebang line #!
is unnecessary on Windows systems, and in fact does nothing unless you specify command-line options there. It's best to drop it altogether
Always use strict
and use warnings 'all'
, and fix the bugs rather than disabling warnings with no warnings 'uninitialized'
BEGIN { unshift @INC, 'C:\\rmhperl' }
is best written use lib 'C:\\rmhperl'
but you're not using libraries in this case so it will have no effect
You should use lexical file handles with the three-parameter form of open
There is no need for (?s)
in the regex pattern as well as the /s
modifier. Unless you are doing something fancy like enabling options for only part of the pattern (which you're not) then people will understand you better if you use the modifier /s
The reason you're only finding one comment is that you're only asking for one. In scalar context a global regex pattern match will iterate through all the matches in the target string one at a time. You only call it once so it finds only the first. You can fix that by using a while
in place of if
I've improved your regex pattern somewhat by making sure that the opening <--
isn't followed by >
or by ->
which would form an illegal HTML comment. There may also be optional space after the closing --
and the >
so I've allowed for that. And you are insisting on a newline after the end of the comment which may not be there, so I've removed that
This code seems to work with your data
use strict;
use warnings 'all';
my $text = do {
open my $fh, '<', 'test.html' or print qq{Unable to open file "test.html" for input: $!};
local $/;
<$fh>;
};
while ( $text =~ /(<!--(?!-?>).*?--\s*>)/sg ) {
my $comment = $1;
print $comment, "\n";
}
<!--
a:link {color: #0000ff;}
a:visited {color: #3563a8;}
a:active {color: #000000;}
a:hover {background-color: #000000;}
a {text-decoration: none;}
-->
<!-- <li><a href="xyz.com">Home</a></li>
-->
<!--1st drop down menu
-->
<!--2nd drop down menu -->
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.