简体   繁体   English

Java - 使用Regex提取字符串

[英]Java - Extract strings with Regex

I've this string 我有这个字符串

String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45";

and I need to extract these 3 substrings 我需要提取这3个子串
1234 1234
06:30 06:30
07:45 07:45

If I use this regex \\\\d{2}\\:\\\\d{2} I'm only able to extract the first hour 06:30 如果我使用这个正则表达式\\\\ d {2} \\:\\\\ d {2}我只能提取第一个小时06:30

Pattern depArrHours = Pattern.compile("\\d{2}\\:\\d{2}");
Matcher matcher = depArrHours.matcher(myString);
String firstHour = matcher.group(0);
String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1)

matcher.group(1) throws an exception. matcher.group(1)抛出异常。
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ ' 另外我不知道如何提取1234.这个字符串可以改变,但总是在'XX~'之后
Do you have any idea on how to match these strings with regex expressions? 您是否知道如何将这些字符串与正则表达式匹配?

UPDATE UPDATE

Thanks to Adam suggestion I've now this regex that match my string 感谢亚当的建议,我现在这个正则表达式匹配我的字符串

Pattern p = Pattern.compile(".*XX~ (\\d{3,4}).*(\\d{1,2}:\\d{2}).*(\\d{1,2}:\\d{2})";

I match the number, and the 2 hours with matcher.group(1); 我匹配数字,与matcher.group(1)匹配2小时; matcher.group(2); matcher.group(2); matcher.group(3); matcher.group(3);

The matcher.group() function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". matcher.group()函数需要采用单个整数参数:捕获组索引,从1开始。索引0是特殊的,表示“整个匹配”。 A capturing group is created using a pair of parenthesis " (...) ". 使用一对括号“ (...) ”创建捕获组。 Anything within the parenthesis is captures. 括号内的任何内容都是捕获。 Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). 组从左到右(再次,从1开始),通过左括号(这意味着组可以重叠)进行编号。 Since there are no parenthesis in your regular expression, there can be no group 1. 由于正则表达式中没有括号,因此不能有第1组。

The javadoc on the Pattern class covers the regular expression syntax. Pattern类上的javadoc涵盖了正则表达式语法。

If you are looking for a pattern that might recur some number of times, you can use Matcher. 如果您正在寻找可能会重复多次的模式,您可以使用Matcher. find() repeatedly until it returns false. 重复find()直到它返回false。 Matcher.group(0) once on each iteration will then return what matched that time. 每次迭代时, Matcher.group(0)将返回与该时间匹配的内容。

If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match() and then Matcher.group(n) where n is 1, 2 and 3 respectively. 如果你想构建一个大的正则表达式,一次性匹配所有东西(我相信你想要的东西),那么围绕你要捕获的三组东西中的每一组,放一组捕获括号,使用Matcher.match()然后是Matcher.group(n) ,其中n分别为1,2和3。 Of course Matcher.match() might also return false, in which case the pattern did not match, and you can't retrieve any of the groups. 当然, Matcher.match()也可能返回false,在这种情况下模式不匹配,并且您无法检索任何组。

In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example. 在你的例子中,你可能想要做的是让它匹配一些前面的文本,然后启动一个捕获组,匹配数字,结束捕获组等...我不知道你的确切输入格式,但这是一个例子。

Lets say I had strings of the form: 让我们说我有这种形式的字符串:

Eat 12 carrots at 12:30
Take 3 pills at 01:15

And I wanted to extract the quantity and times. 我想提取数量和时间。 My regular expression would look something like: 我的正则表达式看起来像:

"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})"

The code would look something like: 代码看起来像:

Pattern p = Pattern.compile("\\w+ (\\d+) [\\w ]+ (\\d{2}:\\d{2})");
Matcher m = p.matcher(oneline);
if(m.matches()) {
    System.out.println("The quantity is " + m.group(1));
    System.out.println("The time is " + m.group(2));
}

The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague. 正则表达式表示“包含单词,空格,一个或多个数字(在组1中捕获)的字符串,空格,一组单词和以空格结尾的空格,后跟一个时间(在第2组中捕获) ,时间假定小时总是0填充到2位数。)我会给你一个更接近的例子,但是对可能输入的描述有点模糊。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM