将单行JavaScript注释（//）与re匹配

Question

I'd like to filter out (mostly one-line) comments from (mostly valid) JavaScript using python's re module. 我想使用python的re模块从（大多数是有效的）JavaScript中过滤掉（主要是单行）注释。 For example: 例如：

// this is a comment
var x = 2 // and this is a comment too
var url = "http://www.google.com/" // and "this" too
url += 'but // this is not a comment' // however this one is
url += 'this "is not a comment' + " and ' neither is this " // only this

I'm now trying this for more than half an hour without any success. 我现在正在尝试这个超过半个小时而没有任何成功。 Can anyone please help me? 谁能帮帮我吗？

EDIT 1 : 编辑1 ：

foo = 'http://stackoverflow.com/' // these // are // comments // too //

EDIT 2 : 编辑2 ：

bar = 'http://no.comments.com/'

Answer 1

My regex powers had gone a bit stale so I've used your question to fresh what I remember. 我的正则表达能力有点陈旧，所以我用你的问题来解读我记得的东西。 It became a fairly large regex mostly because I also wanted to filter multi-line comments. 它变成了一个相当大的正则表达式，主要是因为我也想过滤多行注释。

import re

reexpr = r"""
    (                           # Capture code
        "(?:\\.|[^"\\])*"       # String literal
        |
        '(?:\\.|[^'\\])*'       # String literal
        |
        (?:[^/\n"']|/[^/*\n"'])+ # Any code besides newlines or string literals
        |
        \n                      # Newline
    )|
    (/\*  (?:[^*]|\*[^/])*   \*/)        # Multi-line comment
    |
    (?://(.*)$)                 # Comment
    $"""
rx = re.compile(reexpr, re.VERBOSE + re.MULTILINE)

This regex matches with three different subgroups. 此正则表达式与三个不同的子组匹配。 One for code and two for comment contents. 一个用于代码，两个用于评论内容。 Below is a example of how to extract those. 以下是如何提取这些内容的示例。

code = r"""// this is a comment
var x = 2 * 4 // and this is a comment too
var url = "http://www.google.com/" // and "this" too
url += 'but // this is not a comment' // however this one is
url += 'this "is not a comment' + " and ' neither is this " // only this

bar = 'http://no.comments.com/' // these // are // comments
bar = 'text // string \' no // more //\\' // comments
bar = 'http://no.comments.com/'
bar = /var/ // comment

/* comment 1 */
bar = open() /* comment 2 */
bar = open() /* comment 2b */// another comment
bar = open( /* comment 3 */ file) // another comment 
"""

parts = rx.findall(code)
print '*' * 80, '\nCode:\n\n', '\n'.join([x[0] for x in parts if x[0].strip()])
print '*' * 80, '\nMulti line comments:\n\n', '\n'.join([x[1] for x in parts if x[1].strip()])
print '*' * 80, '\nOne line comments:\n\n', '\n'.join([x[2] for x in parts if x[2].strip()])

Answer 2

It might be easier to parse if you had explicit semi-colons. 如果你有明确的分号，它可能更容易解析。

In any case, this works: 无论如何，这有效：

import re

rx = re.compile(r'.*(//(.*))$')

lines = ["// this is a comment", 
    "var x = 2 // and this is a comment too",
    """var url = "http://www.google.com/" // and "this" too""",
    """url += 'but // this is not a comment' // however this one is""",
    """url += 'this "is not a comment' + " and ' neither is this " // only this""",]

for line in lines: 
    print rx.match(line).groups()

Output of the above: 以上输出：

('// this is a comment', ' this is a comment')
('// and this is a comment too', ' and this is a comment too')
('// and "this" too', ' and "this" too')
('// however this one is', ' however this one is')
('// only this', ' only this')

I'm not sure what you're doing with the javascript after removing the comments, but JSMin might help. 删除评论后我不确定你在使用javascript做什么，但JSMin可能有所帮助。 It removes comments well enough anyway, and there is an implementation in python . 它无论如何都能很好地删除注释，并且在python中有一个实现。

将单行JavaScript注释（//）与re匹配

问题描述

2 个解决方案

解决方案1
7 已采纳 2010-01-26 03:23:19

解决方案2
1 2010-01-25 23:57:33

将单行JavaScript注释（//）与re匹配

问题描述

2 个解决方案

解决方案1 7 已采纳 2010-01-26 03:23:19

解决方案2 1 2010-01-25 23:57:33

解决方案1
7 已采纳 2010-01-26 03:23:19

解决方案2
1 2010-01-25 23:57:33