简体   繁体   中英

Split a string by period, but string contains float numbers

I have a string formed from names (w/o spaces) separated by periods. Each token (after a period) can start with a [a-zA-Z_] or a [ (and ends with a ] ) or a $ (and ends with a $ ).

Examples:

  • House.Car.[0].Flower
  • House.Car.$something$
  • House.Car2.$4.45$.[0]
  • House.Car2.$abc.def$.[0]

So I need to split the string by period, but in the last two examples I DONT want to split the 4.45 (or abc.def ). Anything surrounded by $ should not be splitted.

For the last two example I just want an array like that:

  • House
  • Car2
  • $4.45$ //fixed, thanks Sabuj Hassan
  • [0]

or

  • House
  • Car2
  • $abc.def$
  • [0]

I have tried to use regex, but I'm completely wrong.


I was just informed that after the closing $ there could be another string surrounded by < and > which can again contain dots which I should not split:

  • House.Car.$abc.def$<ghi.jk>.[0].bla

And I need to get it like:

  • House
  • Car
  • $abc.def$<ghi.jk>
  • [0]
  • bla

Thanks for your help.

You are better off collecting the results by "walking" the string to match with .find() :

// Note the alternation
private static final Pattern PATTERN 
    = Pattern.compile("\\$[^.$]+(\\.[^.$]+)*\\$|[^.]+");

//

public List<String> matchesForInput(final String input)
{
    final Matcher m = PATTERN.matcher(input);
    final List<String> matches = new ArrayList<>();

    while (m.find())
        matches.add(m.group());

    return matches;
}

It will be easier with Pattern/Matcher I believe. Raw regex:

\$[^$]+\$|\[[^\]]+\]|[^.]+

In code:

String s = "House.Car2.$4.45$.[0]";
Pattern pattern = Pattern.compile("\\$[^$]+\\$|\\[[^\\]]+\\]|[^.]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
   System.out.println(matcher.group());
}

Output:

House
Car2
$4.45$
[0]

ideonde demo

If not using regex is an option then you can write your own parser which will iterate one time over all characters in your string, checking if character is inside $...$ , [...] or <...> .

  • when you find non . then you need to just add it to token you are building like any ordinary character,
  • same when you find . but it is inside previously mentioned "areas".
  • But if you find . and you are outside of these areas you need to split on it, which means adding currently build token to result and clearing it for next token.

Such parser can look like this

public static List<String> parse(String input){
    //list which will hold retuned tokens
    List<String> tokens = new ArrayList<>();

    // flags representing if currently tested character is inside some of
    // special areas 
    // (at start we are outside of these areas so hey are set to false)
    boolean insideDolar = false;          // $...$
    boolean insideSquareBrackets = false; // [...]
    boolean insideAgleBrackets =false;    // <...>

    // we need some buffer to build tokens, StringBuilder is excellent here
    StringBuilder sb = new StringBuilder();

    // now lets iterate over all characters and decide if we need to add them
    // to token or just add token to result list
    for (char ch : input.toCharArray()){

    // lets update in which area are we
        // finding $ means that we either start or end `$...$` area so 
        // simple negation of flag is enough to update its status
        if (ch == '$') insideDolar = !insideDolar; 
        //updating rest of flags seems pretty obvious 
        else if (ch == '[') insideSquareBrackets = true;
        else if (ch == ']') insideSquareBrackets = false;
        else if (ch == '<') insideAgleBrackets = true;
        else if (ch == '>') insideAgleBrackets = false;

        // So now we know in which area we are, so lets handle special cases
        // if we are handling no dot
        // OR we are handling dot but we are inside either of areas we need 
        // to just add it to token (append it to StringBuilder)
        if (ch != '.' || insideAgleBrackets|| insideDolar || insideSquareBrackets ){
            sb.append(ch);
        }else{// other case means that we are handling dot outside of special 
              // areas where dots are not separators, so now they represents place 
              // to split which means that we don't add it to token, but
              // add value from buffer (current token) to results and reset buffer
              // for next token
            tokens.add(sb.toString());
            sb.delete(0, sb.length());
        }
    }
    // also since we only add value held in buffer to list of tokens when we 
    // find dot on which we split, there is high chance that we will not add 
    // last token to result, because there is no dot after it, so we need to 
    // do it manually after iterating over all characters 
    if (sb.length()>0)//non empty token needs to be added to result
        tokens.add(sb.toString());

    return tokens;
}

and you can use it like

String  input = "House.Car2.$abc.def$<ghi.jk>.[0]";
for (String s: parse(input))
    System.out.println(s);

output:

House
Car2
$abc.def$<ghi.jk>
[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM