简体   繁体   English

在Java中的分隔符之间匹配多行文本

[英]Match for multiple lines of text between delimiters in Java

How can I match multiple lines of text between delimiters in Java? 如何在Java中的分隔符之间匹配多行文本?

Question best explained by an example: 通过示例可以很好地说明问题:

...
unimportant text
EndOfEntry
Key=Value
unimportant text
maybe a few lines of unimportant text
AnotherKey=AnotherValue
EndOfEntry
more unimportant text
...

In the above, I want to match on Key=Value.*AnotherKey=AnotherValue having appeared in one entry together. 在上面,我想匹配键=值。* AnotherKey = AnotherValue一起出现在一个条目中。 I simply want to know whether or not the pattern appear -- I don't need to replace anything. 我只是想知道模式是否出现-我不需要替换任何东西。

However, with the same desired match, if given multiple entries, such as: 但是,如果给定多个条目,则具有相同的期望匹配项,例如:

...
unimportant text
EndOfEntry
Key=Value
unimportant text
maybe a few lines of unimportant text
AnotherKey=NotMyValue
EndOfEntry
RandomKey=Value
unimportant text
maybe a few lines of unimportant text
AnotherKey=AnotherValue
EndOfEntry
more unimportant text
...

I wouldn't want the above to be matched successfully, because we don't see exactly Key=Value and AnotherKey=AnotherValue inside a single "entry". 我不希望上述内容成功匹配,因为我们在单个“条目”中看不到Key = Value和AnotherKey = AnotherValue。 Instead, we see Key=Value in the first entry and AnotherKey=AnotherValue in the second entry. 相反,我们在第一个条目中看到Key = Value,在第二个条目中看到AnotherKey = AnotherValue。

I've been trying with a regex like (and of course \\S\\s can be replaced by the DOTALL option for Pattern): 我一直在尝试使用类似的正则表达式(当然\\ S \\ s可以由Pattern的DOTALL选项代替):

Key=Value[\S\s]*?AnotherKey=AnotherValue

but of course that matches both. 但是当然两者都匹配。 I've also tried: 我也尝试过:

Key=Value[^EndOfEntry]*?AnotherKey=AnotherValue

but that doesn't work because then there's no dot and we're not matching the newlines at all. 但这是行不通的,因为那样就没有点了,我们根本不匹配换行符。

Is there one single regex that can match precisely what I'm looking for? 是否有一个正则表达式可以完全符合我的需求? Would it simplify things to strip newlines first or some other two-step processing (which I'm trying to avoid simply for the sake of education)? 它会简化首先删除换行符或其他两步处理(我只是为了教育而避免这样做)的事情吗?

You should simply use: 您应该简单地使用:

\bKey=Value\b(?:(?!EndOfEntry).)*?\bAnotherKey=AnotherValue\b

(with the DOTALL flag, as you suggested in your question). (如您在问题中所建议的,带有DOTALL标志)。

Experiment it live here on regex101 . 在regex101上进行实验


How it works: 这个怎么运作:

I've basically simply replaced your .* by that expression: ((?!EndOfEntry).)* , which represents roughly anything that doesn't contain EndOfEntry . 我基本上只是将您的.*替换为该表达式: ((?!EndOfEntry).)* ,它大致表示不包含EndOfEntry任何EndOfEntry

In addition, to avoid matching with the pairs RandomKey=Value and AnotherKey=AnotherValue , since RandomKey=Value would also match Key=Value (for example), I've added an other little tweak: 此外,为了避免与RandomKey=ValueAnotherKey=AnotherValue对匹配, AnotherKey=AnotherValue ,由于RandomKey=Value也将匹配Key=Value ,因此,我进行了其他一些调整:

I've surrounded your pairs with \\b (asserts that we're at a word boundary) (or \\s , for any space character), so we'd only have a match when the entire word is matching. 我已经用\\b (断言我们处于单词边界)(或\\s ,对于任何空格字符)包围了您的配对,所以只有当整个单词都匹配时我们才匹配。


Here's a piece of Java code that uses the regex I'm suggesting against your examples: 这是一段Java代码,使用了我建议针对您的示例使用的正则表达式:

final Pattern pattern = Pattern.compile("\\bKey=Value\\b(?:(?!EndOfEntry).)*?\\bAnotherKey=AnotherValue\\b", Pattern.DOTALL);

final String invalid = "unimportant text\n" +
                "EndOfEntry\n" +
                "Key=Value\n" +
                "unimportant text\n" +
                "maybe a few lines of unimportant text\n" +
                "AnotherKey=NotMyValue\n" +
                "EndOfEntry\n" +
                "RandomKey=Value\n" +
                "unimportant text\n" +
                "maybe a few lines of unimportant text\n" +
                "AnotherKey=AnotherValue\n" +
                "EndOfEntry\n" +
                "more unimportant text";

final String valid = "unimportant text\n" +
                "EndOfEntry\n" +
                "Key=Value\n" +
                "unimportant text\n" +
                "maybe a few lines of unimportant text\n" +
                "AnotherKey=AnotherValue\n" +
                "EndOfEntry\n" +
                "more unimportant text";

System.out.println(pattern.matcher(invalid).find());
System.out.println(pattern.matcher(valid).find());

Output: 输出:

false
true

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM