简体   繁体   English

删除 PHP 中所有真正的 Javascript 注释

[英]Remove all REAL Javascript comments in PHP

I'm looking for a solution to strip all javascript comments in an HTML code using PHP.我正在寻找一种使用 PHP 去除 HTML 代码中所有javascript 注释的解决方案。

I want to strip only Javascript comments (not HTML comments and so on).我只想删除Javascript 注释(而不是 HTML 注释等)。 I think that a regex is not a solution because it cannot understand if is a real comment or a part of a string.我认为正则表达式不是解决方案,因为它无法理解是真正的注释还是字符串的一部分。 Example:例子:

<script>

// This is a comment
/* This is another comment */

// The following is not a comment
var src="//google.com"; 

</script>

There is a way to do it?有办法吗? Many thanks in advance提前谢谢了

First thing to do: you need to extract the content of script tags.要做的第一件事:您需要提取脚本标签的内容。 For that, use DOMDocument:为此,请使用 DOMDocument:

$dom = new DOMDocument;
$dom->loadHTML($html);

$scriptNodes = $dom->getElementsByTagName('script');

The second step consists to remove all the javascript comments for each script node.第二步包括删除每个脚本节点的所有 javascript 注释。

You can use a third party javascript parser if you want but you can do that with a regex too.如果需要,您可以使用第三方 javascript 解析器,但您也可以使用正则表达式来实现。 All you need is to prevent parts between quotes to be taken in account.您所需要的只是防止将引号之间的部分考虑在内。

To do that you must search first parts between quotes and discards them.为此,您必须搜索引号之间的第一部分并丢弃它们。 The only difficulty to do that with javascript is that a quote can be inside a regex pattern, example:使用 javascript 做到这一点的唯一困难是引号可以在正则表达式模式中,例如:
/pattern " with a quote/

So you need to find patterns to prevent any error too.所以你需要找到模式来防止任何错误。

Pattern example:图案示例:

$pattern = <<<'EOD'
~
(?(DEFINE)
    (?<squoted> ' [^'\n\\]*+ (?: \\. [^'\n\\]* )*+ ' )
    (?<dquoted> " [^"\n\\]*+ (?: \\. [^"\n\\]* )*+ " )
    (?<tquoted> ` [^`\\]*+ (?s: \\. [^`\\]*)*+ ` )
    (?<quoted>  \g<squoted> | \g<dquoted> | \g<tquoted> )
    
    (?<scomment> // \N* )
    (?<mcomment> /\* [^*]*+ (?: \*+ (?!/) [^*]* )*+ \*/ )
    (?<comment> \g<scomment> | \g<mcomment> )
    
    (?<pattern> / [^\n/*] [^\n/\\]*+ (?>\\.[^\n/\\]*)* / [gimuy]* ) 
)

(?=[[(:,=/"'`])
(?|
    \g<quoted> (*SKIP)(*FAIL)
  |
    ( [[(:,=] \s* ) (*SKIP) (?: \g<comment> \s* )*+ ( \g<pattern> )
  | 
    ( \g<pattern> \s* ) (?: \g<comment> \s* )*+ 
    ( \. \s* ) (?:\g<comment> \s* )*+ ([A-Za-z_]\w*)
  |
    \g<comment>
)
~x
EOD;

Then you replace the content of each script nodes:然后替换每个脚本节点的内容:

foreach ($scriptNodes as $scriptNode) {
    $scriptNode->nodeValue = preg_replace($pattern, '$9${10}${11}', $scriptNode->nodeValue);
}

$html = $dom->saveHTML();

demo演示

Pattern details:图案详情:

((?DEFINE)...) is an area where you can put all subpattern definitions you will need later. ((?DEFINE)...)是一个区域,您可以在其中放置稍后需要的所有子模式定义。 The "real" pattern begins after. “真正的”模式在此之后开始。

(?<name>...) are named subpatterns. (?<name>...)是命名的子模式。 It's the same than a capture group except that you can refer to it with its name (like this \\g<name> ) instead of its number.它与捕获组相同,只是您可以使用其名称(如\\g<name> )而不是其编号来引用它。

*+ are possessive quantifiers *+所有格量词

\\N means a character that is not a newline \\N表示不是换行符的字符

(?=[[(:,=/"' ])</code> is a [lookahead][3] that checks if the next character is one of these <code>[ ( : , = / " ' . The goal of this test is to prevent to test each branch of the following alternation if the character is different. If you remove it, the pattern will work the same, it's only to quickly skip useless positions in the string. (?=[[(:,=/"' ])</code> is a [lookahead][3] that checks if the next character is one of these <code>[ ( : , = / " ' 。目标这个测试的目的是为了防止在字符不同的情况下测试以下交替的每个分支。如果删除它,模式将起作用,只是快速跳过字符串中无用的位置。

(*SKIP) is a backtracking control verb. (*SKIP)是一个回溯控制动词。 When the pattern fails after it, all positions matched before it would not be tried.当模式在它之后失败时,在它之前匹配的所有位置都不会被尝试。

(*FAIL) is a backtracking control verb too and forces the pattern to fail. (*FAIL)也是一个回溯控制动词并强制模式失败。

(?|..(..)..(..)..|..(..)..(..)..) is a branch-reset group. (?|..(..)..(..)..|..(..)..(..)..)是一个分支复位组。 Inside it, the capture groups have respectively the same numbers (9 and 10 for this pattern) in each branch.在其中,捕获组在每个分支中分别具有相同的编号(对于此模式为 9 和 10)

use this function使用这个功能

 function removeComments(str) { str = ('__' + str + '__').split(''); var mode = { singleQuote: false, doubleQuote: false, regex: false, blockComment: false, lineComment: false, condComp: false }; for (var i = 0, l = str.length; i < l; i++) { if (mode.regex) { if (str[i] === '/' && str[i-1] !== '\\') { mode.regex = false; } continue; } if (mode.singleQuote) { if (str[i] === "'" && str[i-1] !== '\\') { mode.singleQuote = false; } continue; } if (mode.doubleQuote) { if (str[i] === '"' && str[i-1] !== '\\') { mode.doubleQuote = false; } continue; } if (mode.blockComment) { if (str[i] === '*' && str[i+1] === '/') { str[i+1] = ''; mode.blockComment = false; } str[i] = ''; continue; } if (mode.lineComment) { if (str[i+1] === 'n' || str[i+1] === 'r') { mode.lineComment = false; } str[i] = ''; continue; } if (mode.condComp) { if (str[i-2] === '@' && str[i-1] === '*' && str[i] === '/') { mode.condComp = false; } continue; } mode.doubleQuote = str[i] === '"'; mode.singleQuote = str[i] === "'"; if (str[i] === '/') { if (str[i+1] === '*' && str[i+2] === '@') { mode.condComp = true; continue; } if (str[i+1] === '*') { str[i] = ''; mode.blockComment = true; continue; } if (str[i+1] === '/') { str[i] = ''; mode.lineComment = true; continue; } mode.regex = true; } } return str.join('').slice(2, -2); }

Use these two links http://trinithis.awardspace.com/commentStripper/stripper.html使用这两个链接http://trinithis.awardspace.com/commentStripper/stripper.html

http://james.padolsey.com/javascript/removing-comments-in-javascript/ http://james.padolsey.com/javascript/removing-comments-in-javascript/

further reference check it Javascript comment stripper进一步参考检查它Javascript 注释剥离器

This RegExp will work for your example:此 RegExp 将适用于您的示例:

^\/(?:\/|\*).*

PHP code: PHP代码:

$re = "/^\\/(?:\\/|\\*).*/m"; 
$str = "<script>\n\n// This is a comment\n/* This is another comment */\n\n// The following is not a comment\nvar src=\"//google.com\"; \n\n</script>"; 

preg_match_all($re, $str, $matches);

DEMO演示


Or maybe this, to validate */ :或者也许这个,以验证*/

^\/{2}.*|\/\*.*\*\/$

PHP code: PHP代码:

$re = "/^\\/{2}.*|\\/\\*.*\\*\\/$/m"; 
$str = "<script>\n\n// This is a comment\n/* This is another comment */\n\n// The following is not a comment\nvar src=\"//google.com\"; \n\n</script>"; 

preg_match_all($re, $str, $matches);

DEMO2演示2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM