简体   繁体   中英

Regular Expression WildCard matching split with java split method

I know there's similar questions like this asked before, but i want to do a custom operation and i don't know how to go about it. I want to split a string of data with a regular expression like, but this time like i know the starting character and the ending character like:

String myString="Google is a great search engine<as:...s>";

The <as: and s> is the beginning and closing characters the ... is dynamic which i cant predict its value

I want to be able to split the string from the beginning <as: to the end s> with the dynamic string in it.

Like:

myString.split("<as:/*s>");

Something like that. I also want to get all the occurrence of the <as:..s> in the string. i know this can be done with regex, but I've never done it before. I need a simple and neat way to do this. Thanks in advance

Rather than using a .split() , I would just extract using Pattern and Matcher . This approach finds everything between <as: and s> and extracts it to a capture group. Group 1 then has the text you would like.

public static void main(String[] args)
{
    final String myString="Google is a great search engine<as:Some stuff heres>";

    Pattern pat = Pattern.compile("^[^<]+<as:(.*)s>$");

    Matcher m = pat.matcher(myString);
    if (m.matches()) {
        System.out.println(m.group(1));
    }
}

Output:

Some stuff here

If you need the text at the beginning, you can put it in a capture group as well.

Edit: If there are more than one <as...s> in the input, then the following will gather all of them. Edit 2: increased the logic. Added checks for emptiness.

public static List<String> multiEntry(final String myString)
{
    String[] parts = myString.split("<as:");

    List<String> col = new ArrayList<>();
    if (! parts[0].trim().isEmpty()) {
        col.add(parts[0]);
    }

    Pattern pat = Pattern.compile("^(.*?)s>(.*)?");        
    for (int i = 1; i < parts.length; ++i) {
        Matcher m = pat.matcher(parts[i]);
        if (m.matches()) {
            for (int j = 1; j <= m.groupCount(); ++j) {
                String s = m.group(j).trim();
                if (! s.isEmpty()) {
                    col.add(s);
                }
            }
        }
    }

    return col;
}

Output:

[Google is a great search engine, Some stuff heress, Here is Facebook, More Stuff, Something else at the end]

Edit 3: This approach uses find and looping to do the parsing. It uses optional capture groups as well.

public static void looping()
{
    final String myString="Google is a great search engine"
            + "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
            + "Something else at the end" +
            "<as:Stuffs>" +
            "<as:Yet More Stuffs>";

    Pattern pat = Pattern.compile("([^<]+)?(<as:(.*?)s>)?");

    Matcher m = pat.matcher(myString);
    List<String> col = new ArrayList<>();

    while (m.find()) {
        String prefix = m.group(1);
        String contents = m.group(3);

        if (prefix != null) { col.add(prefix); }
        if (contents != null) { col.add(contents); }
    }

    System.out.println(col);
}

Output:

[Google is a great search engine, Some stuff heress, Here is Facebook, More Stuff, Something else at the end, Stuff, Yet More Stuff]

Additional Edit: wrote some quick test cases (with super hacked helper class) to help validate. These all pass (updated) multiEntry :

public static void main(String[] args)
{
    Input[] inputs = {
            new Input("Google is a great search engine<as:Some stuff heres>", 2),
            new Input("Google is a great search engine"
                    + "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
                    + "Something else at the end" +
                    "<as:Stuffs>" +
                    "<as:Yet More Stuffs>" +
                    "ending", 8),
            new Input("Google is a great search engine"
                            + "<as:Some stuff heresss>Here is Facebook<as:More Stuffs>"
                            + "Something else at the end" +
                            "<as:Stuffs>" +
                            "<as:Yet More Stuffs>", 7),
            new Input("No as here", 1),       
            new Input("Here is angle < input", 1),
            new Input("Angle < plus <as:Stuff in as:s><as:Other stuff in as:s>", 3),
            new Input("Angle < plus <as:Stuff in as:s><as:Other stuff in as:s>blah", 4),
            new Input("<as:To start with anglass>Some ending", 2),
    };


    List<String> res;
    for (Input inp : inputs) {
        res = multiEntry(inp.inp);
        if (res.size() != inp.cnt) {
            System.err.println("FAIL: " + res.size() 
            + " did not match exp of " + inp.cnt
            + " on " + inp.inp);
            System.err.println(res);
            continue;
        }
        System.out.println(res);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM