简体   繁体   中英

Using Java to find substring of a bigger string using Regular Expression

If I have a string like this:

FOO[BAR]

I need a generic way to get the "BAR" string out of the string so that no matter what string is between the square brackets it would be able to get the string.

eg

FOO[DOG] = DOG
FOO[CAT] = CAT

You should be able to use non-greedy quantifiers, specifically *?. You're going to probably want the following:

Pattern MY_PATTERN = Pattern.compile("\\[(.*?)\\]");

This will give you a pattern that will match your string and put the text within the square brackets in the first group. Have a look at the Pattern API Documentation for more information.

To extract the string, you could use something like the following:

Matcher m = MY_PATTERN.matcher("FOO[BAR]");
while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}

the non-regex way:

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf("["),input.indexOf("]"));

alternatively, for slightly better performance/memory usage (thanks Hosam):

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf('['),input.lastIndexOf(']'));

This is a working example :

RegexpExample.java

package org.regexp.replace;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpExample
{
    public static void main(String[] args)
    {
        String string = "var1[value1], var2[value2], var3[value3]";
        Pattern pattern = Pattern.compile("(\\[)(.*?)(\\])");
        Matcher matcher = pattern.matcher(string);

        List<String> listMatches = new ArrayList<String>();

        while(matcher.find())
        {
            listMatches.add(matcher.group(2));
        }

        for(String s : listMatches)
        {
            System.out.println(s);
        }
    }
}

It displays :

value1
value2
value3
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public static String get_match(String s, String p) {
    // returns first match of p in s for first group in regular expression 
    Matcher m = Pattern.compile(p).matcher(s);
    return m.find() ? m.group(1) : "";
}

get_match("FOO[BAR]", "\\[(.*?)\\]")  // returns "BAR"

public static List<String> get_matches(String s, String p) {
    // returns all matches of p in s for first group in regular expression 
    List<String> matches = new ArrayList<String>();
    Matcher m = Pattern.compile(p).matcher(s);
    while(m.find()) {
        matches.add(m.group(1));
    }
    return matches;
}

get_matches("FOO[BAR] FOO[CAT]", "\\[(.*?)\\]")) // returns [BAR, CAT]

If you simply need to get whatever is between [] , the you can use \\[([^\\]]*)\\] like this:

Pattern regex = Pattern.compile("\\[([^\\]]*)\\]");
Matcher m = regex.matcher(str);
if (m.find()) {
    result = m.group();
}

If you need it to be of the form identifier + [ + content + ] then you can limit extracting the content only when the identifier is a alphanumerical:

[a-zA-Z][a-z-A-Z0-9_]*\s*\[([^\]]*)\]

This will validate things like Foo [Bar] , or myDevice_123["input"] for instance.

Main issue

The main problem is when you want to extract the content of something like this:

FOO[BAR[CAT[123]]+DOG[FOO]]

The Regex won't work and will return BAR[CAT[123 and FOO .
If we change the Regex to \\[(.*)\\] then we're OK but then, if you're trying to extract the content from more complex things like:

FOO[BAR[CAT[123]]+DOG[FOO]] = myOtherFoo[BAR[5]]

None of the Regexes will work.

The most accurate Regex to extract the proper content in all cases would be a lot more complex as it would need to balance [] pairs and give you they content.

A simpler solution

If your problems is getting complex and the content of the [] arbitrary, you could instead balance the pairs of [] and extract the string using plain old code rathe than a Regex:

int i;
int brackets = 0;
string c;
result = "";
for (i = input.indexOf("["); i < str.length; i++) {
    c = str.substring(i, i + 1);
    if (c == '[') {
        brackets++;
    } else if (c == ']') {
        brackets--;
        if (brackets <= 0) 
            break;
    }
    result = result + c;
}   

This is more pseudo-code than real code, I'm not a Java coder so I don't know if the syntax is correct, but it should be easy enough to improve upon.
What count is that this code should work and allow you to extract the content of the [] , however complex it is.

I think your regular expression would look like:

/FOO\[(.+)\]/

Assuming that FOO going to be constant.

So, to put this in Java:

Pattern p = Pattern.compile("FOO\\[(.+)\\]");
Matcher m = p.matcher(inputLine);
String input = "FOO[BAR]";
String result = input.substring(input.indexOf("[")+1,input.lastIndexOf("]"));

This will return the value between first '[' and last ']'

Foo[Bar] => Bar

Foo[Bar[test]] => Bar[test]

Note: You should add error checking if the input string is not well formed.

Like this its work if you want to parse some string which is coming from mYearInDB.toString() =[2013] it will give 2013

Matcher n = MY_PATTERN.matcher("FOO[BAR]"+mYearInDB.toString());
while (n.find()) {
 extracredYear  = n.group(1);
 // s now contains "BAR"
    }
    System.out.println("Extrated output is : "+extracredYear);

This regexp works for me:

form\[([^']*?)\]

example:

form[company_details][0][name]
form[company_details][0][common_names][1][title]

output:

Match 1
1.  company_details
Match 2
1.  company_details

Tested on http://rubular.com/

"FOO[DOG]".replaceAll("^.*?\\[|\\].*", "");

This will return a string taking only the string inside square brackets.

This remove all string outside from square brackets.

You can test this java sample code online: http://tpcg.io/wZoFu0

You can test this regex from here: https://regex101.com/r/oUAzsS/1

假设其中没有其他结束方括号,/ FOO \\ [([^ \\]] *)\\] /

I'd define that I want a maximum number of non-] characters between [ and ] . These need to be escaped with backslashes (and in Java, these need to be escaped again), and the definition of non-] is a character class, thus inside [ and ] (ie [^\\\\]] ). The result:

FOO\\[([^\\]]+)\\]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM