简体   繁体   English

需要一个正则表达式来提取子字符串

[英]Need a regex to extract a substring

I am struggling to figure out a decent way to extract a substring from the below string. 我正在努力想办法从下面的字符串中提取子字符串的一种不错的方法。

Inputs: 

Invoice 1800000173 of 06/18/2014/150 USD Discnt to 07/02/2014

Invoice 1200000000 of 02.04.2014/150 Details

The above are the 2 possible combinations. 以上是2种可能的组合。

Expected Output:

Invoice 1800000173 of 06/18/2014

Invoice 1200000000 of 02.04.2014

There is a similar question asked here - Regex to get date from string but that didn't help me here. 这里有一个类似的问题-正则表达式从字符串中获取日期,但这对我没有帮助。 Any suggestions ? 有什么建议么 ?

"Invoice (\d+) of (\d\d[./]\d\d[./]\d{4})"

Two capturing groups, first of one or more digit for invoice number, second for the date portion. 两个捕获组,第一个或多个数字用于发票编号,第二个用于日期部分。 Escaping the backslashes will also need to be done appropriately. 转义反斜杠也将需要适当地完成。

Try this 尝试这个

([0-9]+) of ([0-9]{1,2}[,/][0-9]{1,2}[,/][0-9]{1,4})

first group contains invoice number and second date. 第一组包含发票编号和第二个日期。

您可以尝试以下方法:

Invoice [0-9]+ of ([0-9]{2}[\/.][0-9]{2}[\/.][0-9]{4})

Regx for your need will be as, 您需要的Regx将是,

Invoice\\s\\d+\\sof\\s\\d+[/.]\\d+[/.]+\\d+

And use pattern matcher to get the required sub string.. As, 并使用模式匹配器获取所需的子字符串。

public class StringProcesing {

    public void fetchSubString() {
        String s1 = "Invoice 1800000173 of 06/18/2014/150 USD Discnt to 07/02/2014";
        String s2 = "Invoice 1200000000 of 02.04.2014/150 Details";

        Pattern p = Pattern
                .compile("Invoice\\s\\d+\\sof\\s\\d+[/.]\\d+[/.]+\\d+");
        Matcher matchS1 = p.matcher(s1);
        while(matchS1.find()) {
            System.out.println(matchS1.group());
        }
        Matcher matchS2 = p.matcher(s2);
        while(matchS2.find()) {
            System.out.println(matchS2.group());
        }
    }

    public static void main(String[] args) {
        StringProcesing obj = new StringProcesing();
        obj.fetchSubString();
    }
}

Output: 输出:

Invoice 1800000173 of 06/18/2014
Invoice 1200000000 of 02.04.2014

You can use String#replaceFirst to capture what you want and discard rest: 您可以使用String#replaceFirst捕获所需内容并丢弃其余内容:

String str = "Invoice 1800000173 of 06/18/2014/150 USD Discnt to 07/02/2014";
String m = str.replaceFirst("^.*(Invoice +\\d+ +of +\\d{2}[./]\\d{2}[./]\\d{4}).*$", "$1");
//=> Invoice 1800000173 of 06/18/2014

str = "Invoice 1200000000 of 02.04.2014/150 Details";
m = str.replaceFirst("^.*(Invoice +\\d+ +of +\\d{2}[./]\\d{2}[./]\\d{4}).*$", "$1");
//=> Invoice 1200000000 of 02.04.2014

I have nice solution : 我有很好的解决方案:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexFun {
    public static void main(String[] args) {
        String input = "Inputs: \r\n" + "\r\n" + "Invoice 1800000173 of 06/18/2014/150 USD Discnt to 07/02/2014\r\n"
                + "\r\n" + "Invoice 1200000000 of 02.04.2014/150 Details";
        Pattern emailPattern = Pattern.compile("^Invoice \\d{10} of \\d{1,2}[ ._/-]\\d{1,2}[ ._/-]\\d{2,4}",
                Pattern.MULTILINE);
        Matcher matcher = emailPattern.matcher(input);
        while (matcher.find()) {
            String group = matcher.group();
            System.out.println("group=" + group);
        }
    }
}

By enabling MULTILINE mode you can use caret ^ to match start of each line. 通过启用MULTILINE模式,您可以使用插入号^匹配每行的开头。

Character set [ ._/-] allows you to match any date separator. 字符集[ ._/-]允许您匹配任何日期分隔符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM