[英]how do I use regex to find patterns that occur in non-standard orders?
I have a need to be able to read a string and extract numerical values of different types and group them. 我需要能够读取字符串并提取不同类型的数值并将其分组。 However, these numbers may appear in any order. 但是,这些数字可能以任何顺序出现。 For example, I have two types of liquids (toxic and non-toxic). 例如,我有两种类型的液体(有毒和无毒)。 the string will have between zero and n occurrences of either type, and in no guaranteed order. 该字符串将出现0到n次之间的任何一种,并且没有保证的顺序。 but I want to be able to sum up each type. 但我希望能够总结每种类型。 Example String input is: 示例字符串输入为:
10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic 10毫升有毒abcdeljsdg 15毫升有毒alkewag 25毫升无毒lkjasdg 30毫升有毒40毫升无毒
should return groupings of: 应该返回以下分组:
10ml toxic, 15 ml toxic, 30ml toxic, 25 ml non-toxic, 40 ml non-toxic 10毫升有毒,15毫升有毒,30毫升有毒,25毫升无毒,40毫升无毒
because i want to be able to add them up to get a total of 55ml toxic and 65ml non-toxic. 因为我希望能够将它们加起来以获得总共55毫升有毒和65毫升无毒。
How do I write a regular expression pattern to be able to group these out? 如何编写正则表达式模式以将其分组?
I have messed around with using ? 我弄乱了使用? to be non-greedy, but that doesn't seem to work with numerical values. 不贪心,但这似乎不适用于数值。
By using regex you can group them like this: 通过使用正则表达式,您可以像这样将它们分组:
String data = "10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic";
Pattern pattern= Pattern.compile("\\d+[\\s]?ml toxic");
Matcher matcher= pattern.matcher(data);
while(matcher.find()) {
System.out.println(matcher.group());
}
The result will be: 结果将是:
10ml toxic
15 ml toxic
30ml toxic
You can do the same with non-toxic. 您也可以使用无毒药做同样的事情。 Then you can keep continue with calculate the sum of each group. 然后,您可以继续计算每个组的总和。
The RegEx you should use is 您应该使用的RegEx是
(\\d+(?=\\s*ml\\s*toxic))|(\\d+(?=\\s*ml\\s*non-toxic))
\\\\d+
will match any number of digits greater than 1 \\\\d+
将匹配任何大于1的数字
(?=)
includes the following phrase in the match but not in the results (?=)
在比赛中包含以下短语,但不在结果中
\\\\s*ml\\\\s*toxic
matches any number of spaces, ml
, any number of spaces again, toxic
. \\\\s*ml\\\\s*toxic
匹配任意数量的空格, ml
,再次匹配任意数量的toxic
。
|
represents the or operator in regex, so 代表正则表达式中的or运算符,因此
|(\\\\d+(?=\\\\s*ml\\\\s*non-toxic))
can be added to find the non-toxic volume. |(\\\\d+(?=\\\\s*ml\\\\s*non-toxic))
以查找无毒体积。
Matcher.group(1)
is going to include values that matched the left half of the expression, and Matcher.group(2)
for the right half. Matcher.group(1)
将包含与表达式左半部分匹配的值,而Matcher.group(2)
将包含与表达式左半部分匹配的值。
String pattern = "(\\d+(?=\\s*ml\\s*toxic))|(\\d+(?=\\s*ml\\s*non-toxic))";
String str = "10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
int sum1 = 0;
int sum2 = 0;
while(m.find()){
if (m.group(1)!=null)
sum1 += Integer.parseInt(m.group(1));
if (m.group(2)!=null)
sum2 += Integer.parseInt(m.group(2));
}
System.out.println("Toxic = " + sum1);
System.out.println("Non-Toxic = " + sum2);
This will output 这将输出
Toxic = 55
Non-Toxic = 65
And don't forget to import 而且不要忘记导入
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Another possibility: 另一种可能性:
String data = "10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic";
Pattern pattern= Pattern.compile("(\\d+)[\\s]*(ml)\\s+((?:non-)?toxic)");
Matcher matcher= pattern.matcher(data);
while(matcher.find()) {
System.out.println(matcher.group(1) + matcher.group(2) + " " + matcher.group(3));
}
This will output: 这将输出:
10ml toxic
15ml toxic
25ml non-toxic
30ml toxic
40ml non-toxic
You still need to group the results by matcher.group(3)
: 您仍然需要按matcher.group(3)
将结果matcher.group(3)
:
Map<String,List<String>> map = new HashMap<>();
Matcher matcher= pattern.matcher(data);
while(matcher.find()) {
String value = matcher.group(1);
String unit = matcher.group(2);
String key = matcher.group(3);
List<String> list = map.get(key);
if (list == null) {
list = new ArrayList<>();
map.put(key, list);
}
list.add(value + unit);
}
System.out.println(map);
Output: 输出:
{toxic=[10ml, 15ml, 30ml], non-toxic=[25ml, 40ml]}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.