简体   繁体   English

使用正则表达式无法匹配字符串

[英]Cannot match string using regex

I am working on some regex and I wonder why this regex 我正在研究一些正则表达式,我想知道为什么这个正则表达式

"(?<=(.*?id(( *)=)\\s[\"\']))g"

does not match the string 与字符串不匹配

<input id = "g" />

in Java? 在Java?

Java.util.regex does not support infinite look-behind, as described in by RegexBuddy : Java.util.regex不支持无限后视,如RegexBuddy所述

The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. 坏消息是,大多数正则表达式都不允许您在lookbehind中使用任何正则表达式,因为它们无法向后应用正则表达式。 Therefore, the regular expression engine needs to be able to figure out how many steps to step back before checking the lookbehind. 因此,正则表达式引擎需要能够在检查lookbehind之前找出退回的步骤数。

To add a little clarification from the documentation: 从文档中添加一些说明:

Therefore, many regex flavors, including those used by Perl and Python, only allow fixed-length strings. 因此,许多正则表达式,包括Perl和Python使用的那些,只允许固定长度的字符串。 You can use any regex of which the length of the match can be predetermined. 您可以使用任何可以预先确定匹配长度的正则表达式。 This means you can use literal text and character classes. 这意味着您可以使用文字文本和字符类。 You cannot use repetition or optional items. 您不能使用重复或可选项。 You can use alternation, but only if all options in the alternation have the same length. 您可以使用交替,但仅当交替中的所有选项具有相同的长度时。

Some regex flavors, like PCRE and Java support the above, plus alternation with strings of different lengths. 一些正则表达式的风格,如PCRE和Java支持上述,加上不同长度的字符串的交替。 Each part of the alternation must still have a finite maximum length. 交替的每个部分仍必须具有有限的最大长度。 This means you can still not use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. 这意味着您仍然不能使用星号或加号,但您可以使用问号和带有指定max参数的花括号。 These regex flavors recognize the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths. 这些正则表达式的味道认识到有限重复可以被重写为具有不同但固定长度的字符串的交替。 Unfortunately, the JDK 1.4 and 1.5 have some bugs when you use alternation inside lookbehind. 不幸的是,当你在lookbehind中使用交替时,JDK 1.4和1.5有一些错误。 These were fixed in JDK 1.6. 这些是在JDK 1.6中修复的。

So a couple of people have explained why your regexp is not working (and it's fatal really; Java regular expressions can't do what you need). 所以有几个人已经解释了为什么你的正则表达式不起作用(而且它确实是致命的; Java正则表达式不能满足你的需要)。 However, you might wondering how you should now parse this ... 但是,你可能想知道你现在应该如何解析这个......

It looks like the string you're trying to parse is XML. 看起来你要解析的字符串是XML。 Regex is really not a good approach to parsing XML; Regex实际上不是解析XML的好方法; there is a mismatch between what can be encoded in XML and what can be matched using regular expressions. XML中可编码的内容与使用正则表达式匹配的内容之间存在不匹配。 So if this is part of some XML text, maybe consider slurping it into an XML parser that you can then query for the different elements. 因此,如果这是某些XML文本的一部分,可以考虑将其篡改为XML解析器,然后您可以查询不同的元素。

For a calm and reasonable discussion of this issue, see this classic stackoverflow post: RegEx match open tags except XHTML self-contained tags . 有关此问题的冷静和合理的讨论,请参阅此经典stackoverflow帖子: RegEx匹配除XHTML自包含标记之外的开放标记

Hope this helps! 希望这可以帮助!

Not only does Java not allow unbounded lookbehind, it's supposed to throw an exception if you try. Java不仅不允许无限制的lookbehind,如果你尝试,它应该抛出异常。 The fact that you're not seeing that exception is itself a bug . 您没有看到该异常的事实本身就是一个错误

You shouldn't be using lookbehind for that anyway. 无论如何你不应该使用lookbehind。 If you want to match the value of a certain attribute, the easiest, least troublesome approach is to match the whole attribute and use a capturing group to extract the value. 如果要匹配某个属性的值,最简单,最麻烦的方法是匹配整个属性并使用捕获组来提取值。 For example: 例如:

String source = "<input id = \"g\" />"; 
Pattern p = Pattern.compile("\\bid\\s*=\\s*\"([^\"]*)\"");
Matcher m = p.matcher(source);
if (m.find())
{
  System.out.printf("Found 'id' attribute '%s' at position %d%n",
                    m.group(1), m.start());
}

Output: 输出:

Found 'id' attribute 'g' at position 7

Do yourself a favor and forget about lookbehinds for a while. 帮自己一个忙,暂时忘掉看后卫。 They're tricky even when they're not buggy, and they're really not as useful as you might expect. 即使他们没有马车,它们也很棘手,而且它们真的没有你想象的那么有用。

java.util.regex不支持lookbehind中的无限重复

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM