[英]Using regex to extract specific pattern
I'm having a hard time using regular expressions in Java even after reading numerous tutorials online. 即使在线阅读了大量的教程之后,我也很难在Java中使用正则表达式。 I'm trying to extract parts of a String received to be used later in my application.
我正在尝试提取收到的String的一部分,以便稍后在我的应用程序中使用。
Here are examples of the possible String received: 以下是收到的可能字符串的示例:
53248 <CERCLE> 321 211 55 </CERCLE>
57346 <RECTANGLE> 272 99 289 186 </RECTANGLE>
The first number is to be extracted as a sequence number. 第一个数字将被提取为序列号。 The word between <> is to be extracted as well.
<>之间的单词也将被提取。 Then, the sequence of numbers in between as well.
然后,它们之间的数字序列也是如此。
Here is my pattern: 这是我的模式:
"(\\d+)\\s*<(\\w+)>\\s*((\\d+\\s*)+)\\s*</\\w*>.*"
Here is the code for my method so far: 到目前为止,这是我的方法的代码:
public decompose(String s) throws IllegalArgumentException {
Pattern pattern = Pattern.compile(PATTERN);
Matcher matcher = pattern.matcher(s);
noSeq = Integer.parseInt(matcher.group(1));
type = typesFormes.valueOf(matcher.group(2));
strCoords = matcher.group(3).split(" ");
}
Problem is that when I run the code, all my matcher groups are at -1 for some reason (not found I guess). 问题是,当我运行代码时,由于某种原因,我的所有匹配器组都为-1(我猜不到)。 I've been banging my head on this for a while and any suggestion is welcome :) Thanks.
我一直在敲打这个问题一段时间,欢迎提出任何建议:)谢谢。
Simply try with String#split()
只需尝试使用
String#split()
String str="53248 <CERCLE> 321 211 55 </CERCLE>";
String[] array=str.split("(\\s<|>\\s)");
// simple regex (space < OR > space)
Note: Try with \\\\s+
if there are one ore more spaces. 注意:如果有一个或多个空格,请尝试使用
\\\\s+
。
Use first three values of array that will be 53248, CERCLE, 321 211 55
in this case. 在这种情况下
53248, CERCLE, 321 211 55
使用前三个数组值为53248, CERCLE, 321 211 55
。
Complete code: 完整代码:
String str = "53248 <CERCLE> 321 211 55 </CERCLE>";
String[] array = str.split("(\\s<|>\\s)");
int noSeq = Integer.valueOf(array[0]);
String type = array[1];
String strCoords = array[2];
System.out.println(noSeq+", "+type+", "+strCoords);
output: 输出:
53248, CERCLE, 321 211 55
You just needed to tell the matcher to start matching the pattern against the input string. 您只需要告诉匹配器开始匹配输入字符串的模式。 This works for me on ideone :
这对我来说很有用 :
String s = "53248 <CERCLE> 321 211 55 </CERCLE>";
String PATTERN = "(\\d+)\\s*<(\\w+)>\\s*((\\d+\\s*)+)\\s*</\\w*>.*";
Pattern pattern = Pattern.compile(PATTERN);
Matcher matcher = pattern.matcher(s);
matcher.find(); // aye, there's the rub
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
Output was: 产出是:
53248
CERCLE
321 211 55
The find()
method, when successful, will let the matcher yield the information you want. find()
方法成功后,将让匹配器生成所需的信息。 From the javadocs: 来自javadocs:
If the match succeeds then more information can be obtained via the start, end, and group methods.
如果匹配成功,则可以通过start,end和group方法获得更多信息。
group()
says something similarly indicative, emphasis mine: group()
说出一些类似的指示,强调我的:
Returns the input subsequence captured by the given group during the previous match operation.
返回在上一个匹配操作期间由给定组捕获的输入子序列。
As @2rs2ts pointed out, the problem was the missing matcher.find()
call. 正如@ 2rs2ts指出的那样,问题是缺少
matcher.find()
调用。
I would further improve like this: 我会像这样进一步改进:
final String PATTERN = "(\\d+)\\s*<(\\w+)>\\s*([\\d\\s]+)\\s*</\\2>.*";
String s = "53248 <CERCLE> 321 211 55 </CERCLE>";
Pattern pattern = Pattern.compile(PATTERN);
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3).trim());
}
Some improvements: 一些改进:
((\\\\d+\\\\s*)+)
as ([\\\\d\\\\s]+)
. ((\\\\d+\\\\s*)+)
简化为([\\\\d\\\\s]+)
。 For your purpose, it's equivalent. <CERCLE>
with a closing </CERCLE>
, not </OTHER>
. <CERCLE>
与结束</CERCLE>
匹配,而不是</OTHER>
。 You can do that using \\\\2
, which is a back reference to the 2nd capture group. \\\\2
执行此操作, \\\\2
是第二个捕获组的后向引用。 matcher.find()
if anything was matched. matcher.find()
的结果来判断是否有任何匹配。 .trim()
. .trim()
修剪末尾可能的尾随空格。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.