简体   繁体   English

如何使用正则表达式查找非标准订单中出现的模式?

[英]how do I use regex to find patterns that occur in non-standard orders?

I have a need to be able to read a string and extract numerical values of different types and group them. 我需要能够读取字符串并提取不同类型的数值并将其分组。 However, these numbers may appear in any order. 但是,这些数字可能以任何顺序出现。 For example, I have two types of liquids (toxic and non-toxic). 例如,我有两种类型的液体(有毒和无毒)。 the string will have between zero and n occurrences of either type, and in no guaranteed order. 该字符串将出现0到n次之间的任何一种,并且没有保证的顺序。 but I want to be able to sum up each type. 但我希望能够总结每种类型。 Example String input is: 示例字符串输入为:

10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic 10毫升有毒abcdeljsdg 15毫升有毒alkewag 25毫升无毒lkjasdg 30毫升有毒40毫升无毒

should return groupings of: 应该返回以下分组:

10ml toxic, 15 ml toxic, 30ml toxic, 25 ml non-toxic, 40 ml non-toxic 10毫升有毒,15毫升有毒,30毫升有毒,25毫升无毒,40毫升无毒

because i want to be able to add them up to get a total of 55ml toxic and 65ml non-toxic. 因为我希望能够将它们加起来以获得总共55毫升有毒和65毫升无毒。

How do I write a regular expression pattern to be able to group these out? 如何编写正则表达式模式以将其分组?

I have messed around with using ? 我弄乱了使用? to be non-greedy, but that doesn't seem to work with numerical values. 不贪心,但这似乎不适用于数值。

By using regex you can group them like this: 通过使用正则表达式,您可以像这样将它们分组:

String data = "10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic";
Pattern pattern= Pattern.compile("\\d+[\\s]?ml toxic");
Matcher matcher= pattern.matcher(data);
while(matcher.find()) {
    System.out.println(matcher.group());
}

The result will be: 结果将是:

 10ml toxic
 15 ml toxic
 30ml toxic

You can do the same with non-toxic. 您也可以使用无毒药做同样的事情。 Then you can keep continue with calculate the sum of each group. 然后,您可以继续计算每个组的总和。

The RegEx you should use is 您应该使用的RegEx是

(\\d+(?=\\s*ml\\s*toxic))|(\\d+(?=\\s*ml\\s*non-toxic))

\\\\d+ will match any number of digits greater than 1 \\\\d+将匹配任何大于1的数字

(?=) includes the following phrase in the match but not in the results (?=)在比赛中包含以下短语,但不在结果中

\\\\s*ml\\\\s*toxic matches any number of spaces, ml , any number of spaces again, toxic . \\\\s*ml\\\\s*toxic匹配任意数量的空格, ml ,再次匹配任意数量的toxic

| represents the or operator in regex, so 代表正则表达式中的or运算符,因此

|(\\\\d+(?=\\\\s*ml\\\\s*non-toxic)) can be added to find the non-toxic volume. |(\\\\d+(?=\\\\s*ml\\\\s*non-toxic))以查找无毒体积。

Matcher.group(1) is going to include values that matched the left half of the expression, and Matcher.group(2) for the right half. Matcher.group(1)将包含与表达式左半部分匹配的值,而Matcher.group(2)将包含与表达式左半部分匹配的值。

String pattern = "(\\d+(?=\\s*ml\\s*toxic))|(\\d+(?=\\s*ml\\s*non-toxic))";
String str = "10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic";

Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);

int sum1 = 0;
int sum2 = 0;
while(m.find()){
    if (m.group(1)!=null)
        sum1 += Integer.parseInt(m.group(1));
    if (m.group(2)!=null)
        sum2 += Integer.parseInt(m.group(2));
}
System.out.println("Toxic = " + sum1);
System.out.println("Non-Toxic = " + sum2);

This will output 这将输出

Toxic = 55
Non-Toxic = 65

And don't forget to import 而且不要忘记导入

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Another possibility: 另一种可能性:

String data = "10ml toxic abcdeljsdg 15 ml toxic alkewag 25 ml non-toxic lkjasdg 30ml toxic 40 ml non-toxic";
Pattern pattern= Pattern.compile("(\\d+)[\\s]*(ml)\\s+((?:non-)?toxic)");
Matcher matcher= pattern.matcher(data);
while(matcher.find()) {
    System.out.println(matcher.group(1) + matcher.group(2) + " " + matcher.group(3));
}

This will output: 这将输出:

10ml toxic
15ml toxic
25ml non-toxic
30ml toxic
40ml non-toxic

You still need to group the results by matcher.group(3) : 您仍然需要按matcher.group(3)将结果matcher.group(3)

Map<String,List<String>> map = new HashMap<>();
Matcher matcher= pattern.matcher(data);
while(matcher.find()) {
    String value = matcher.group(1);
    String unit = matcher.group(2);
    String key = matcher.group(3);
    List<String> list = map.get(key);
    if (list == null) {
        list = new ArrayList<>();
        map.put(key, list);
    }
    list.add(value + unit);
}
System.out.println(map);

Output: 输出:

{toxic=[10ml, 15ml, 30ml], non-toxic=[25ml, 40ml]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获取“查找下一个”以在Eclipse中的非标准扩展上工作? - How to get Find Next to work on non-standard extensions in Eclipse? 带有非标准命名源集的Gradle-如何使它们可用于测试类? - Gradle with non-standard named source sets - how do I make them available to the test classes? 如何让 SSHJ 在非标准端口上启动出站 SFTP? - How do I make SSHJ initiate outbound SFTP on a non-standard port? 您如何解析非标准形式的函数? - How do you parse non-standard form function? 您如何洗牌非标准牌? - How do you shuffle a non-standard deck of cards? 如何使用Maven测试具有非标准文件结构的代码? - How to use Maven to test code with a non-standard file structure? 如何在Java中使用非标准变量名来生成XML标签? - How to use non-standard variable names in Java for producing XML tags? 使用非标准的Maven回购位置进行Gradle构建 - Use a non-standard Maven repo location for a gradle build 使用Java,如何在Android体系结构中引用非标准类型xml对象? - Using Java, how can I reference a non-standard type xml object in the Android architecture? 如何扩展sun.awt.windows以提取非标准剪贴板格式,以便支持从本机应用程序中拖放操作? - How do I extend sun.awt.windows to extract a non-standard clipboard format so that I can support drag and drop from a native application?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM