简体   繁体   English

匹配“//”注释与正则表达式但不在引号内

[英]match “//” comments with regex but not inside a quote

I need to match and replace some comments. 我需要匹配并替换一些评论。 for example: 例如:

$test = "the url is http://www.google.com";// comment "<-- that quote needs to be matched

I want to match the comments outside of the quotes, and replace any " 's in the comments with &quot; 's. 我想匹配引号之外的注释,并用注释替换注释中的任何" s &quot;

I have tried a number of patterns and different ways of running them but with no luck. 我已经尝试了许多模式和不同的运行方式,但没有运气。

The regex will be run with javascript to match php "//" comments 正则表达式将使用javascript运行以匹配php“//”注释

UPDATE: I took the regex from borkweb below and modified it. 更新:我从下面的borkweb拿了正则表达式并修改它。 used a function from http://ejohn.org/blog/search-and-dont-replace/ and came up with this: 使用了http://ejohn.org/blog/search-and-dont-replace/中的一个函数,得出了这个:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
    <head>
        <title></title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <script type="text/javascript">
            function t_replace(data){
               var q = {}, ret = "";
                data.replace(/(?:((["'\/]*(("[^"]*")|('[^']*'))?[\s]*)?[\/\/|#][^"|^']*))/g, function(value){
                    q[key] = value;
                });
                for ( var key in q ){
                    ret =  q[key];
                }
                var text = data.split(ret);
                var out = ret + text[1];
                out = out.replace(/"/g,"&quot;");
                out = out.replace(/'/g,"&apos;");
                return text[0] + out;
            }
        </script>
    </head>
    <body>
        <script type="text/javascript">
            document.write(t_replace("$test = \"the url is http://www.google.com\";// c'o\"mment \"\"\"<-- that quote needs to be matched")+"<br>");
            document.write(t_replace("$test = 'the url is http://www.google.com';# c'o\"mment \"\"\"<-- that quote needs to be matched"));
        </script>
    </body>
</html>

it handles all the line comments outside of single or double quotes. 它处理单引号或双引号之外的所有行注释。 Is there anyway I could optimize this function? 无论如何我可以优化这个功能吗?

UPDATE 2: it does not handle this string 更新2:它不处理此字符串

document.write(t_replace("$test //= \"the url is http://www.google.com\"; //c'o\"mment \"\"\"<-- that quote needs to be matched")+"<br>");

You can have a regexp to match all strings and comments at the same time. 您可以使用正则表达式同时匹配所有字符串和注释。 If it's a string, you can replace it with itself, unchanged, and then handle a special case for comments. 如果它是一个字符串,您可以将其替换为自身,不更改,然后处理特殊情况以进行注释。

I came up with this regex: 我想出了这个正则表达式:

"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)

There are 3 parts: 共有3个部分:

  • "(\\\\[\\s\\S]|[^"])*" for matching double quoted strings. "(\\\\[\\s\\S]|[^"])*"用于匹配双引号字符串。
  • '(\\\\[\\s\\S]|[^'])*' for matching single quoted strings. '(\\\\[\\s\\S]|[^'])*'用于匹配单引号字符串。
  • (\\/\\/.*|\\/\\*[\\s\\S]*?\\*\\/) for matching both single line and multiline comments. (\\/\\/.*|\\/\\*[\\s\\S]*?\\*\\/)用于匹配单行注释和多行注释。

The replace function check if the matched string is a comment. replace函数检查匹配的字符串是否为注释。 If it's not, don't replace. 如果不是,请不要更换。 If it is, replace " and ' . 如果是,请替换"'

function t_replace(data){
    var re = /"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)/g;
    return data.replace(re, function(all, strDouble, strSingle, comment) {
        if (comment) {
            return all.replace(/"/g, '&quot;').replace(/'/g, '&apos;');
        }
        return all;
    });
}

Test run: 测试运行:

Input: $test = "the url is http://www.google.com";// c'o"mment """<-- that quote needs to be matched
Output: $test = "the url is http://www.google.com";// c&apos;o&quot;mment &quot;&quot;&quot;<-- that quote needs to be matched

Input: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched
Output: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched

Input: $test //= "the url is http://www.google.com"; //c'o"mment """<-- that quote needs to be matched
Output: $test //= &quot;the url is http://www.google.com&quot;; //c&apos;o&quot;mment &quot;&quot;&quot;<-- that quote needs to be matched

I have to admit, this regex took me a while to generate...but I'm pretty sure this will do what you are looking for: 我必须承认,这个正则表达式花了我一段时间才产生......但我很确定这会做你想要的:

<script>
var str = "$test = \"the url is http://www.google.com\";// comment \"\"\"<-- that quote needs to be matched";
var reg = /^(?:(([^"'\/]*(("[^"]*")|('[^']*'))?[\s]*)?\/\/[^"]*))"/g;

while( str !== (str = str.replace( reg, "$1&quot;") ) );

console.log( str );

</script>

Here's what's going on in the regex: 这是正则表达式中发生的事情:

^ # start with the beginning of the line
(?:           # don't capture the following
 (
  ([^"'\/]*   # start the line with any character as long as it isn't a string or a comment
   (
    ("[^"]*") # grab a double quoted string
    |         # OR 
    ('[^']*') # grab a single quoted string
   )?         # but...we don't HAVE to match a string
   [\s]*      # allow for any amount of whitespace
  )?          # but...we don't HAVE to have any characters before the comment begins
  \/\/        # match the start of a comment
  [^"]*       # match any number of characters that isn't a double quote
 )            # end un-caught grouping
)             # end the non-capturing declaration
"             # match your commented double quote

The while loop in javascript is just find/replacing until it can't find any additional matches in a given line. javascript中的while循环只是查找/替换,直到它找不到给定行中的任何其他匹配项。

Don't forget that PHP comments can also take the form of /* this is a comment */ which can be span across multiple lines. 不要忘记PHP注释也可以采用/* this is a comment */的形式/* this is a comment */可以跨越多行。

This site may be of interest to you: 您可能会对此网站感兴趣:

http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript

Javascript does not have native lookbehind support in it's regular expression engine. Javascript在它的正则表达式引擎中没有本机lookbehind支持。 What you may be able to do is start at the end of a line and look backward to capture any characters that follow a semi colon + optional whitespace + // So something like: 您可以做的是从一行的末尾开始并向后看以捕获半冒号后面的任何字符+可选的空格+ //所以类似于:

;\w*\/\/(.+)$

This may not capture everything. 这可能无法捕捉到一切。

You also may want to look for a Javascript (or other languages) PHP syntax checker. 您还可能希望查找Javascript(或其他语言)PHP语法检查器。 I think Komodo Edit's PHP syntax checker may be written in Javascript. 我认为Komodo Edit的PHP语法检查器可能是用Javascript编写的。 If so, it may give you insight on how to strip everything out but comments as the syntax checkers need to ensure the PHP code is valid, comments and all. 如果是这样,它可能会让您深入了解如何删除所有内容,但是语法检查器需要确保PHP代码有效,注释等等。 The same can be said about syntax color changers. 语法颜色变换器也是如此。 Here are two other links: 以下是另外两个链接:

http://ecoder.quintalinda.com/ http://ecoder.quintalinda.com/

http://www.webdesignbooth.com/9-useful-javascript-syntax-highlighting-scripts/ http://www.webdesignbooth.com/9-useful-javascript-syntax-highlighting-scripts/

In complement of @Thai answer which I found very good, I would like to add a bit more: 在我发现非常好的@Thai答案的补充中,我想补充一点:

In this example using original regex only the last character of quotes will be matched: https://regex101.com/r/CoxFvJ/2 在此示例中,使用原始正则表达式仅匹配引号的最后一个字符: https//regex101.com/r/CoxFvJ/2

So I modified a bit to allow capture of full quotes content and give a more talkative and generic example of content: https://regex101.com/r/CoxFvJ/3 所以我修改了一下以允许捕获完整的引号内容,并提供更健谈和通用的内容示例: https//regex101.com/r/CoxFvJ/3

So final regex: 最后的正则表达式:

/"((?:\\\\"|[^"])*)"|'((?:\\\\'|[^'])*)'|(\\/\\/.*|\\/\\*[\\s\\S]*?\\*\\/)/g

Big thanks to Thai for unlocking me. 非常感谢泰国解锁我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM