[英]What Java regular expression do I need to match this text?
I'm trying to match the following using a regular expression in Java - I have some data separated by the two characters 'ZZ'. 我正在尝试使用Java中的正则表达式来匹配以下内容-我有一些数据用两个字符“ ZZ”分隔。 Each record starts with 'ZZ' and finishes with 'ZZ' - I want to match a record with no ending 'ZZ' for example, I want to match the trailing 'ZZanychars' below (Note: the *'s are not included in the string - they're just marking the bit I want to match).
每个记录都以“ ZZ”开头,以“ ZZ”结尾-例如,我想匹配一个没有结尾“ ZZ”的记录,我想匹配下面的尾随“ ZZanychars”(注意:*不包括在字符串-他们只是标记我要匹配的位)。
ZZanycharsZZZZanycharsZZ ZZanychars ZZanycharsZZZZanycharsZZ ZZanychars
But I don't want the following to match because the record has ended: 但我不希望以下内容匹配,因为记录已结束:
ZZanycharsZZZZanycharsZZZZanycharsZZ ZZanycharsZZZZanycharsZZZZanycharsZZ
EDIT: To clarify things - here are the 2 testcases I am using: 编辑:澄清一下-这是我正在使用的2个测试用例:
// This should match and in one of the groups should be 'ZZthree'
String testString1 = "ZZoneZZZZtwoZZZZthree";
// This should not match
String testString2 = "ZZoneZZZZtwoZZZZthreeZZ";
EDIT: Adding a third test: 编辑:添加第三个测试:
// This should match and in one of the groups should be 'threeZee'
String testString3 = "ZZoneZZZZtwoZZZZthreeZee";
(Edited after the post of the 3rd example) (在第三个示例发布后编辑)
Try: 尝试:
(?!ZZZ)ZZ((?!ZZ).)++$
Demo: 演示:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String[] tests = {
"ZZoneZZZZtwoZZZZthree",
"ZZoneZZZZtwoZZZZthreeZZ",
"ZZoneZZZZtwoZZZZthreeZee"
};
Pattern p = Pattern.compile("(?!ZZZ)ZZ((?!ZZ).)++$");
for(String tst : tests) {
Matcher m = p.matcher(tst);
System.out.println(tst+" -> "+(m.find() ? m.group() : "no!"));
}
}
}
To match only the final, unterminated record: 仅匹配最终的,未终止的记录:
(?<=[^Z]ZZ|^)ZZ(?:(?!ZZ).)++$
The starting delimiter is two Z
's, but there can be a third Z
that's considered part of the data. 起始定界符是两个
Z
,但是可以有第三个Z
被视为数据的一部分。 The lookbehind ensures that you don't match a Z
that's part of the previous record's ending delimiter (since an ending delimiter can not be preceded by a non-delimiter Z
). 向后查找确保您不匹配前一条记录的结束定界符的一部分
Z
(因为结束定界符不能以非定界符Z
)。 However, this assumes there will never be empty records (or records containing only a single Z
), which could lead to eight or more Z
's in a row: 但是,这假设不会有空记录(或仅包含单个
Z
记录),这可能导致连续出现八个或更多Z
:
ZZabcZZZZdefZZZZZZZZxyz
If that were possible, I would forget about trying to match the final record by itself, and instead match all of them from the beginning: 如果可能的话,我会忘记尝试单独匹配最终记录,而是从头开始匹配所有记录:
(?:ZZ(?:(?!ZZ).)*+ZZ)*+(ZZ(?:(?!ZZ).)++$)
The final, unterminated record is now captured in group #1. 现在,最终的,未终止的记录被捕获在#1组中。
I'd suggest something like... 我建议像...
/ZZ(.*?)(ZZ|$)/
This will match: 这将匹配:
ZZ
— the literal string ZZ
—文字字符串 (.*?)
— anychars (.*?)
— anychars (ZZ|$)
— either another ZZ literal, or the end of the string (ZZ|$)
-另一个ZZ文字或字符串的结尾 ^ZZ.*(?<!ZZ)$
Assert position at the beginning of the string «^»
Match the characters “ZZ” literally «ZZ»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!ZZ)»
Match the characters “ZZ” literally «ZZ»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Created with RegexBuddy
There's one tricky part to this: The ZZ being both the start token and the end token. 这有一个棘手的部分:ZZ既是开始令牌又是结束令牌。
There's one start case (ZZ, not followed by another ZZ which would signify that the first ZZ was actually an end token), and two end cases (ZZ end of string, ZZ followed by ZZ). 有一个开始情况(ZZ,之后没有另一个ZZ,这表示第一个ZZ实际上是一个结束标记),还有两个结束情况(ZZ字符串的末尾,ZZ后跟ZZ)。 The goal is to match the start case and NOT either of the end cases.
目标是匹配开始情况,而不匹配任何一种结束情况。
To that end, here's what I suggest: 为此,我提出以下建议:
/ZZ(?!ZZ)(.*?)(ZZ(?!(ZZ|$))|$)/
For string ZZfooZZZZbarZZbazZZ
: 对于字符串
ZZfooZZZZbarZZbazZZ
:
One more case: For ZZfoo
, the beginning ZZ is okay, the foo is captured, then the regex notes that it's the end of the string, and no ZZ has occurred. ZZfoo
一种情况:对于ZZfoo
,开始的ZZ没问题,捕获foo,然后正则表达式指出它是字符串的结尾,并且没有ZZ发生。 Thus, ZZfoo is captured as an illegitimate match. 因此,ZZfoo被捕获为非法匹配。
Let me know if this doesn't make sense, so I can make it more clear. 让我知道这是否没有道理,因此我可以更清楚地说明。
如何尝试删除ZZallcharsZZ的所有匹配项,剩下的就是您想要的。
ZZ.*?ZZ
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.