正则表达式HTML标记javascript

Question

I want to verify if the code that enter is a HTML code ( is necessary to start with <html> and end with </html> ) 我想验证输入的代码是否为HTML代码（以<html>开头和</html>结束是必需的）

I try to do this 我尝试这样做

 var reghtml = new RegExp("(<html>*\\n+</html>)");

but I have a problem is necessary to make a \\n in the code, I need to verify the first and end tag ( = <html> and </html> ) and if he make something between them is necessary to start with < and end with > 但是我有一个问题需要在代码中添加\\ n，我需要验证第一个标签和结束标签（= <html>和</html> ），并且如果他在它们之间做一些操作，则必须以<和开头以>结尾

is there any solution ? 有什么解决办法吗？

Answer 1

You shouldn't use regular-expressions to validate HTML (let alone parse it) because HTML is not a " Regular Language ". 您不应该使用正则表达式来验证HTML（更不用说解析它了），因为HTML不是“ 常规语言 ”。

So here's an example of a false-negative case which would cause any regular expression you could write to attempt to validate HTML to mark it as invalid: 因此，下面是一个假阴性案例的示例，该案例将导致您可以编写任何正则表达式来尝试验证HTML以将其标记为无效：

<html>
<head>
    <!-- </html> -->
</head>
<body>
    <p>This is valid HTML</p>
</body>
</html>

And because you can nest comments in HTML (and SGML and XML) you can't write a straightforward regex for this particular case either: 而且，因为您可以在HTML（以及SGML和XML）中嵌套注释，所以您也不能为这种特殊情况编写简单的正则表达式：

<html>
<head>
    <!-- <!-- <!-- <!-- </html> -->
</head>
<body>
    <p>This is valid HTML</p>
</body>
</html>

And here's a false-positive (assuming you don't use the ^$ regex anchors): 这是一个假阳性（假设您不使用^$ regex锚）：

<p>illegal element</p>
<html>
    <img>illegal text node</img>
</html>
<p>another illegal element</p>

Granted, there are more powerful implementations of of regular-expressions that add rudiminary support for things like counting-depth, but then you're in for a world of hurt. 当然，有更强大的正则表达式实现，为诸如计数深度之类的功能增加了基本支持，但是那样您就陷入了痛苦的世界。

The correct way to validate HTML is to use a HTML DOM library. 验证HTML的正确方法是使用HTML DOM库。 In .NET this is HtmlAgilityPack. 在.NET中，这是HtmlAgilityPack。 In browser-based JavaScript it's even simpler: just use the browser's built-in parser ( innerHTML ): 在基于浏览器的JavaScript中，它甚至更简单：只需使用浏览器的内置解析器（ innerHTML ）：

(stolen from Check if HTML snippet is valid with Javascript ) （从“ 检查HTML代码段是否对Java脚本有效”中窃取）

function isValidHtml(html) {
    var doc = document.implementation.createHTMLDocuiment("");
    doc.documentElement.innerHTML = html;
    return ( doc.documentElement.innerHTML === html );
}

Answer 2

Here a pattern for you. 这是给你的模式。 It checks if the first level has a valid opening and closing tag. 它检查第一级是否具有有效的开始和结束标签。 The first level has to have closing tags, you can't do <html><img /></html> , for that you can remove the whole closing tag checking pattern part. 第一层必须具有结束标记，您不能执行<html><img /></html> ，因为您可以删除整个结束标记检查模式部分。

 var validHtml = '\\ <html itemscope>\\ <head></head>\\ <body style="background: red;">\\ Everything is fine\\ </body>\\ </html>\\ ', invalidHtml = '\\ <html itemscope>\\ <head></foot>\\ <body>\\ Nothing is fine\\ </body>\\ </html>\\ ', pattern = /^\\s*<html(?:\\s[^>]*)?>(?:\\s*<(\\w+)(?:\\s[^>]+)?>(?:.|\\s)*<\\/\\1>\\s*)*<\\/html>\\s*$/i; console.log(pattern.test(validHtml) ? 'valid' : 'invalid'); console.log(pattern.test(invalidHtml) ? 'valid' : 'invalid');

正则表达式HTML标记javascript

问题描述

2 个解决方案

解决方案1
2 2016-11-26 23:02:20

解决方案2
1 2016-11-26 22:52:25

正则表达式HTML标记javascript

问题描述

2 个解决方案

解决方案1 2 2016-11-26 23:02:20

解决方案2 1 2016-11-26 22:52:25

解决方案1
2 2016-11-26 23:02:20

解决方案2
1 2016-11-26 22:52:25