简体   繁体   English

javascript正则表达式,以匹配单引号,双引号和正则斜杠之间的任何内容

[英]javascript regex to match anything between single quotes, double quotes and regex slashes

I'm trying to match anything between either double quotes, single quotes, or regex slashes, basically anything that isn't tokenized by javascript as a string or regex. 我正在尝试匹配双引号,单引号或正则表达式斜杠之间的任何内容,基本上是没有被javascript标记为字符串或正则表达式的任何内容。 So far what I came up with is: 到目前为止,我想到的是:

/"[^\\"\n]*(\\"[^\\"\n]*)*"|'[^\\'\n]*(\\'[^\\'\n]*)*'|\/[^\\\/\n]*(\\\/[^\\\/\n]*)*\//

But there are a couple of problems with this as you can see here 但是与此相关的还有一些问题,您可以在这里看到

http://goo.gl/4Yn9pR http://goo.gl/4Yn9pR

Basically this shouldn't match 1+2/3+4/5 since it isn't a regex. 基本上,这不应该匹配1+2/3+4/5 4/5,因为它不是正则表达式。 Also
Dont match "Match here\\\\" Dont match" should match the first part and not the second (thats true for single quotes and regexes too) Dont match "Match here\\\\" Dont match"应该匹配第一部分而不是第二部分(单引号和正则表达式也是如此)

How should this be written? 应该怎么写?

Edit: If it's not possible differentiate between 1+2/3+4/5 , /*comment*/ and /regex/ using regular expressions, how would I just solve the Dont match "Match here\\\\" Dont match" problem 编辑:如果不可能使用正则表达式来区分1+2/3+4/5/*comment*//regex/ ,我该如何解决不Dont match "Match here\\\\" Dont match"问题

The trick to match c-alike escaped strings is like this: 匹配类似c的转义字符串的技巧如下:

" (\\. | [^"]) * "

That is, 那是,

 - quote
 - repeat (
    - one escaped char
    - or not a quote
   )
  - quote

Similarly with single quotes. 与单引号类似。 Illustration in python since JS regexes are ugly: Python中的插图,因为JS正则表达式很丑陋:

import re

test = r"""
    foo "bar" and "bar\"bar" and "bar\\bar" and "bar \\"
    foo 'bar' and 'bar\'bar' and 'bar\\bar' and 'bar \\'
"""

rr = r"""(?x)
    " (\\. | [^"]) * "
    |
    ' (\\. | [^']) * '
"""

print re.sub(rr, '@@', test)

> foo @@ and @@ and @@ and @@
> foo @@ and @@ and @@ and @@

It might be necessary to add newlines to the [^"] group. 可能需要在[^"]组中添加换行符。

Do note that this expression is quite forgiving and allows many constructs that aren't valid javascript. 请注意,此表达式相当宽容,并允许使用许多无效的javascript构造。 See https://stackoverflow.com/a/13800082/989121 for the complete and accurate implementation. 请参阅https://stackoverflow.com/a/13800082/989121以获取完整而准确的实施方案。

Just figured it out. 只是想通了。 I was very close. 我离得很近。 Here's the solution: 解决方法如下:

/"[^\\"\n]*(\\["\\][^\\"\n]*)*"|'[^\\'\n]*(\\['\\][^\\'\n]*)*'|\/[^\\\/\n]*(\\[\/\\][^\\\/\n]*)*\//

DEMO DEMO

It's very similar to thg435 answer but I think it's a little more performent because it doesn't backtrack as much 它与thg435答案非常相似,但我认为它的性能更高,因为它不会回退太多

What I was missing was when looking for an escaped quote, I should have also been looking for an escaped backslash too, so i changed \\\\" to \\\\["\\\\] As opposed to thg435's answer which looks at anything after a backslash which while valid can use up more states in the regex engine 我所缺少的是在寻找转义的引号时,我也应该也在寻找转义的反斜杠,所以我将\\\\"更改为\\\\["\\\\]而不是thg435的答案,后者会在反斜杠后查看任何内容虽然有效,但可以用完正则表达式引擎中的更多状态

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM