简体   繁体   中英

How can I split a string into groups?

I'm trying to work out how to split a string into groups. I don't think the split(regex) method will suffice on it's own.

I have String complexStatement = "(this && that)||(these&&those)||(me&&you)"; and I would like an array out with this kind of form:

"(this && that)","(these&&those)","(me&&you)""

If I had "(5+3)*(2+5)+(9)" then I'd like to have "(5+3)","(2+5)","(9)".
(bonus points if you can somehow keep the join information, eg *,+,|| )

Is this possible for an arbitrary string input? I'm playing with a StringTokenizer but I haven't quite gotten to grips with it yet.

You can use the bellow code:

    String str = "(this && that)\",\"(these&&those)\",\"(me&&you)";
    Pattern pattern = Pattern.compile("\\(([^\\)]+)\\)");
    Matcher m = pattern.matcher(str);
    while (m.find()){
        System.out.println(m.group(0));
    }

\\\\(([^\\\\)]+)\\\\) will dig you anything within the parenthesis, look like what you want!:

Edit:

To capture content between ) and ( just replace the regular expression with \\\\)([^\\\\(]+)\\\\( this one!

I think you better implement the parsing instead of depending on any ready-made methods.

Here is my suggestion... I am assuming the format of input will be always like followig

(value1+operator+value2)+operator+(value3+operator+value4)+........

[here operator can be different, and + is just showing concatanation).

If the above assumptio is true then you can do the following.

  1. Use a stack
  2. While reading the original string push all the characters into the stack
  3. now popup one by one from the stack by using following logic a. if get ) start adding to a string b. if get ( add to the string and now you get one token. add the token to the array. c. after getting ( skip till the next ).

NB it's just and pseudo code with primitive thinking.

If you want to capture the groups defined only by parentheses at the outermost level, you are outside of the world of regular expressions and will need to parse the input. StinePike's approach is good; another one (in messy pseudocode) is as follows:

insides = []
outsides = []
nesting_level = 0
string = ""
while not done_reading_input():
    char = get_next_char()
    if nesting_level > 0 or char not in ['(', ')']:
        string += char
    if char == '('
        if nesting_level == 0:
            outsides.add(string)
            string = ""
        nesting_level += 1
    elif char == ')':
        nesting_level -= 1
        if nesting_level == 0:
            insides.add(string)
            string = ""

If the very first character in your input is a '(', you'll get an extra string in your outsides array, but you can fix that without much trouble.

If you are interested in nested parentheses then you will not be producing just two arrays as output; you will need a tree.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM