简体   繁体   English

Java正则表达式模式匹配器

[英]Java regex pattern matcher

I have a string of the following format: 我有以下格式的字符串:

String name = "A|DescA+B|DescB+C|DescC+...X|DescX+"

So the repeating pattern is ?|?+ , and I don't know how many there will be. 所以重复模式是?|?+ ,我不知道会有多少个。 The part I want to extract is the part before |...so for my example I want to extract a list (an ArrayList for example) that will contain: 我要提取的部分是| ...之前的部分,因此对于我的示例,我要提取一个包含以下内容的列表(例如ArrayList):

[A, B, C, ... X]

I have tried the following pattern: 我尝试了以下模式:

(.+)\\|.*\\+

but that doesn't work the way I want it to? 但这不符合我想要的方式吗? Any suggestions? 有什么建议么?

To convert this into a list you can do like this: 要将其转换为列表,您可以执行以下操作:

String name = "A|DescA+B|DescB+C|DescC+X|DescX+";
Matcher m = Pattern.compile("([^|]+)\\|.*?\\+").matcher(name);
List<String> matches = new ArrayList<String>();
while (m.find()) {
    matches.add(m.group(1));
}

This gives you the list: 这给出了列表:

[A, B, C, X]

Note the ? 注意? in the middle, that prevents the second part of the regex to consume the entire string, since it makes the * lazy instead of greedy . 在中间,这样可以防止正则表达式的第二部分占用整个字符串,因为它使* 惰性,而不是greedy

You are consuming any character ( . ) and that includes the | 您正在使用任何字符( . ),其中包括| so, the parser goes on munching everything, and once it's done taking any char, it looks for | 因此,解析器会继续用力嚼所有内容,一旦完成获取任何字符的操作,它就会寻找| , but there's nothing left. ,但是什么都没有了。

So, try to match any character but | 因此,尝试匹配 |任何字符| like this: 像这样:

"([^|]+)\\|.*\\+"

And if it fits, make sure your all-but-| 并且如果适合,请确保您的所有| is at the beginning of the string using ^ and that there's a + at the end of the string with $ : 在使用^的字符串的开头,在使用$的字符串的结尾有一个+:

"^([^|]+)\\|.*\\+$"

UPDATE: Tim Pietzcker makes a good point: since you are already matching until you find a | 更新:蒂姆·皮茨克(Tim Pietzcker)提出了一个很好的观点:由于您已经匹配了,直到找到| , you could just as well match the rest of the string and be done with it: ,您也可以匹配其余字符串并完成操作:

"^([^|]+).*\\+$"

UPDATE2: By the way, if you want to simply get the first part of the string, you can simplify things with: UPDATE2:顺便说一句,如果您只想获取字符串的第一部分,则可以使用以下方法简化操作:

myString.split("\\|")[0]

Another idea: Find all characters between + (or start of string) and | 另一个想法:查找+ (或字符串开头)和|之间的所有字符| :

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("(?<=^|[+])[^|]+");
Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group());
    } 

我认为最简单的解决方案是用\\\\+分割,然后对每个部分应用(.+?)\\\\|.*模式提取所需的组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM