[英]Java regex backreference for two digits
I am working with a regex and I want to use it on the replaceAll
method of the String class in Java. 我正在使用正则表达式,并且想在Java中String类的
replaceAll
方法上使用它。
My regex works fine and groupCount()
returns 11. So, when I try to replace my text using backreference pointing to the eleventh group, I am getting the first group with a "1" attached to it, instead of the group eleven. 我的正则表达式可以正常工作,并且
groupCount()
返回11。因此,当我尝试使用指向第11个组的后向引用替换文本时,我得到的第一个组带有附加的“ 1”,而不是第11个组。
String regex = "(>[^<]*?)((\+?\d{1,4}[ \t\f\-\.](\d[ \t\f\-\.])?)?(\(\d{1,4}([\s-]\d{1,4})?\)[\.\- \t\f])?((\d{2,6}[\.\- \t\f])+\d{2,6})|(\d{6,16})([;,\.]{1,3}\d{3,}#?)?)([^<]*<)";
String text = "<span style=\"font-size:11.0pt\">675-441-3144;;;78888464#<o:p></o:p></span>":
String replacement = text.replaceAll(regex, $1<a href="tel:$2">$2</a>$11");
I am expecting to get the following result: 我期望得到以下结果:
<span style=\"font-size:11.0pt\"><a href=\"tel:675-441-3144;;;78888464#\">675-441-3144;;;78888464#</a><o:p></o:p></span>
But the $11 backreference is not returning the 11th group, it is returning the first group with a 1 attached to it, and instead I am getting the following result: 但是$ 11的反向引用没有返回第11个组,而是返回了第一个附加了1的组,相反,我得到了以下结果:
<span style="font-size:11.0pt"><a href="tel:675-441-3144">675-441-3144</a>>1o:p></o:p></span>
Can someone please tell me how to access the eleventh group of my pattern? 有人可以告诉我如何访问我的模式的第11组吗?
Thanks. 谢谢。
The way you access the eleventh group of a match in the replacement is with $11
. 访问替换中比赛的第十一组的方式是使用
$11
。
As the corresponding Javadoc * states: 如相应的Javadoc *所述:
The replacement string may contain references to subsequences captured during the previous match: Each occurrence of
${name}
or$g
will be replaced by the result of evaluating the corresponding group(name) or group(g) respectively.替换字符串可能包含对先前匹配过程中捕获的子序列的引用:
${name}
或$g
每次出现都将被分别评估相应group(name)或group(g)的结果替换。 For$g
, the first number after the$
is always treated as part of the group reference.对于
$g
,在之后的第一个数字$
始终被视为该组参考的一部分。 Subsequent numbers are incorporated intog
if they would form a legal group reference.如果后续数字将构成合法的组引用,则将其合并到
g
。
So generally speaking, as long as have at least eleven groups, then "$11"
will evaluate to group(11)
. 因此,一般来讲,只要至少有11个组,则
"$11"
将评估为group(11)
。 However, if you do not have at least eleven groups, then "$11"
will evaluate to group(1) + "1"
. 但是,如果您没有至少11个组,则
"$11"
将计算为group(1) + "1"
。
* This quote is from Matcher#appendReplacement(StringBuffer,String)
, which is where the chain of relevant citations from String#replaceAll(String,String)
leads to. * 此引用来自
Matcher#appendReplacement(StringBuffer,String)
,这是来自String#replaceAll(String,String)
的相关引用链的所在。
Your regex does not do what you think it does. 您的正则表达式不会执行您认为的操作。
Let's divide your regex into its three top-level groups. 让我们将正则表达式分为三个顶级组。 These are groups 1, 2, and 11, respectively.
它们分别是组1、2和11。
(>[^<]*?)
((\\+?\\d{1,4}[ \\t\\f\\-\\.](\\d[ \\t\\f\\-\\.])?)?(\\(\\d{1,4}([\\s-]\\d{1,4})?\\)[\\.\\- \\t\\f])?((\\d{2,6}[\\.\\- \\t\\f])+\\d{2,6})|(\\d{6,16})([;,\\.]{1,3}\\d{3,}#?)?)
([^<]*<)
Group 2 is the main body of your regex, and it consists of a top-level alternation over two options. 第2组是您的正则表达式的主体,它由两个选项的顶级交替组成。 These two options consist of groups 3-8 and 9-10, respectively.
这两个选项分别由3-8组和9-10组组成。
((\\+?\\d{1,4}[ \\t\\f\\-\\.](\\d[ \\t\\f\\-\\.])?)?(\\(\\d{1,4}([\\s-]\\d{1,4})?\\)[\\.\\- \\t\\f])?((\\d{2,6}[\\.\\- \\t\\f])+\\d{2,6})
(\\d{6,16})([;,\\.]{1,3}\\d{3,}#?)?)
Now, given the text
string, here is what is going on: 现在,给定
text
字符串,这是怎么回事:
">"
. ">"
相匹配。 "675-441-3144"
. "675-441-3144"
。 "675-441-3144"
. "675-441-3144"
。 "675-441-3144"
, which is immediately before ";;;78888464#"
. "675-441-3144"
";;;78888464#"
之前的";;;78888464#"
。 "<"
, which is all of ";;;78888464#<"
. "<"
匹配所有内容;下一个"<"
是所有";;;78888464#<"
。 Thus, some of the content that you want to be in group 2 is actually in group 11 instead. 因此,您希望放在第2组中的某些内容实际上是在第11组中。
Do both of the following two things: 请同时执行以下两项操作:
Convert the contents of group 2 from 将第2组的内容转换为
option1|option2
to 至
option1(option2)?|option2
Change $11
in your replacement pattern to $12
. 将替换模式中的
$11
更改$11
$12
。
This will greedy match one or both options, rather than only one option. 这会使贪婪地匹配一个或两个选项,而不是只有一个选项。 The modification to the replacement pattern is because we have added a group.
替换模式的修改是因为我们添加了一个组。
Now that we have modified the regex, our original "option 2" no longer makes sense. 现在,我们已经修改了正则表达式,原来的“选项2”不再有意义。 Given our new pattern template
option1(option2)?|option2
, it will be impossible for group 2 to match "675-441-3144;;;78888464#"
. 给定我们新的模式模板
option1(option2)?|option2
,第2组将不可能匹配"675-441-3144;;;78888464#"
。 This is because our original "option 1" will match all of "675-441-3144"
and then stop. 这是因为我们原来的“选项1”将匹配所有
"675-441-3144"
,然后停止。 Our original "option 2" will then attempt to match ";;;78888464#"
, but will be unable to because it begins with a mandatory capture group of 6-10 digits: (\\d{6,16})
, but ";;;78888464#"
begins with a semicolon. 然后,我们原始的“选项2”将尝试匹配
";;;78888464#"
,但将无法匹配,因为它以6-10位数字的强制捕获组开头: (\\d{6,16})
,但";;;78888464#"
以分号开头。
Convert the contents of our original "option 2" from 将原始“选项2”的内容转换为
(\d{6,16})([;,\.]{1,3}\d{3,}#?)?
to 至
([;,\.]{1,3}\d{3,}#?)?
We have one final problem to solve. 我们还有最后一个问题要解决。 Now that our original "option 2" consists only of a single group with the
?
现在,我们原来的“选项2”仅包含一个带有
?
组?
quantifier, it is possible for it to successfully match a zero-length substring. 量词,它有可能成功匹配零长度子串。 So our pattern template
option1(newoption2)?|newoption2
could result in a zero-length match, which does not fulfill the intended purpose of matching phone numbers. 因此,我们的模式模板
option1(newoption2)?|newoption2
可能会导致长度为零的匹配,这不能满足匹配电话号码的预期目的。
Do both of the following: 请执行以下两个操作:
Convert the contents of our new "option 2" from 将新的“选项2”的内容转换为
([;,.]{1,3}\\d{3,}#?)? ([;,。] {1,3} \\ d {3,}#?)?
to 至
[;,.]{1,3}\\d{3,}#? [;,。] {1,3} \\ d {3,}#?
Change $12
in our replacement string to $10
, since we have now removed one group in two locations. 将替换字符串中的
$12
更改$12
$10
,因为现在我们已在两个位置删除了一个组。
Putting everything together, our final solution is as follows. 综上所述,我们最终的解决方案如下。
Search regex: 搜索正则表达式:
(>[^<]*?)((\+?\d{1,4}[ \t\f\-\.](\d[ \t\f\-\.])?)?(\(\d{1,4}([\s-]\d{1,4})?\)[\.\- \t\f])?((\d{2,6}[\.\- \t\f])+\d{2,6})([;,\.]{1,3}\d{3,}#?)?|[;,\.]{1,3}\d{3,}#?)([^<]*<)
Replacement regex: 替换正则表达式:
$1<a href="tel:$2">$2</a>$10
Java: Java:
final String searchRegex = "(>[^<]*?)((\\+?\\d{1,4}[ \\t\\f\\-\\.](\\d[ \\t\\f\\-\\.])?)?(\\(\\d{1,4}([\\s-]\\d{1,4})?\\)[\\.\\- \\t\\f])?((\\d{2,6}[\\.\\- \\t\\f])+\\d{2,6})([;,\\.]{1,3}\\d{3,}#?)?|[;,\\.]{1,3}\\d{3,}#?)([^<]*<)";
final String replacementRegex = "$1<a href=\"tel:$2\">$2</a>$10";
String text = "<span style=\"font-size:11.0pt\">675-441-3144;;;78888464#<o:p></o:p></span>";
String replacement = text.replaceAll(searchRegex, replacementRegex);
Well, after trying to do it with replaceall without success, I had to implement the replacement method by myself: 好吧,在尝试使用replaceall而不成功之后,我不得不自己实现替换方法:
public static String parsePhoneNumbers(String html){
StringBuilder regex = new StringBuilder(120);
regex.append("(>[^<]*?)(")
.append("((\+?\d{1,4}[ \t\f\-\.](\d[ \t\f\-\.])?)?")
.append("(\(\d{1,4}([\s-]\d{1,4})?\)[\.\- \t\f])?")
.append("((\d{2,6}[\.\- \t\f])+\d{2,6})|(\d{6,16})")
.append("([;,\.]{1,3}\d{3,}#?)?)")
.append(")+([^<]*<)");
StringBuilder mutableHtml = new StringBuilder(html.length());
Pattern pattern = Pattern.compile(regex.toString());
Matcher matcher = pattern.matcher(html);
int start = 0;
while(matcher.find()){
mutableHtml.append(html.substring(start, matcher.start()));
mutableHtml.append(matcher.group(1)).append("<a href=\"tel:")
.append(matcher.group(2)).append("\">").append(matcher.group(2))
.append("</a>").append(matcher.group(matcher.groupCount()));
start = matcher.end();
}
mutableHtml.append(html.substring(start));
return mutableHtml.toString();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.