I want to verify if the code that enter is a HTML code ( is necessary to start with <html>
and end with </html>
)
I try to do this
var reghtml = new RegExp("(<html>*\\n+</html>)");
but I have a problem is necessary to make a \\n in the code, I need to verify the first and end tag ( = <html>
and </html>
) and if he make something between them is necessary to start with <
and end with >
is there any solution ?
You shouldn't use regular-expressions to validate HTML (let alone parse it) because HTML is not a " Regular Language ".
So here's an example of a false-negative case which would cause any regular expression you could write to attempt to validate HTML to mark it as invalid:
<html>
<head>
<!-- </html> -->
</head>
<body>
<p>This is valid HTML</p>
</body>
</html>
And because you can nest comments in HTML (and SGML and XML) you can't write a straightforward regex for this particular case either:
<html>
<head>
<!-- <!-- <!-- <!-- </html> -->
</head>
<body>
<p>This is valid HTML</p>
</body>
</html>
And here's a false-positive (assuming you don't use the ^$
regex anchors):
<p>illegal element</p>
<html>
<img>illegal text node</img>
</html>
<p>another illegal element</p>
Granted, there are more powerful implementations of of regular-expressions that add rudiminary support for things like counting-depth, but then you're in for a world of hurt.
The correct way to validate HTML is to use a HTML DOM library. In .NET this is HtmlAgilityPack. In browser-based JavaScript it's even simpler: just use the browser's built-in parser ( innerHTML
):
(stolen from Check if HTML snippet is valid with Javascript )
function isValidHtml(html) {
var doc = document.implementation.createHTMLDocuiment("");
doc.documentElement.innerHTML = html;
return ( doc.documentElement.innerHTML === html );
}
Here a pattern for you. It checks if the first level has a valid opening and closing tag. The first level has to have closing tags, you can't do <html><img /></html>
, for that you can remove the whole closing tag checking pattern part.
var validHtml = '\\ <html itemscope>\\ <head></head>\\ <body style="background: red;">\\ Everything is fine\\ </body>\\ </html>\\ ', invalidHtml = '\\ <html itemscope>\\ <head></foot>\\ <body>\\ Nothing is fine\\ </body>\\ </html>\\ ', pattern = /^\\s*<html(?:\\s[^>]*)?>(?:\\s*<(\\w+)(?:\\s[^>]+)?>(?:.|\\s)*<\\/\\1>\\s*)*<\\/html>\\s*$/i; console.log(pattern.test(validHtml) ? 'valid' : 'invalid'); console.log(pattern.test(invalidHtml) ? 'valid' : 'invalid');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.