简体   繁体   中英

Java: how to separate string into parts using regex?

I have to parse a Java String into 3 separate cases:

  1. If it has the form "PREFIX(<signed_float>)=<Some_alpha_num_string>" , I need to extract <signed_float> into one ( Double ) variable, <Some_alpha_num_string> into another ( String ) variable and ignore the rest.
  2. Otherwise, if it has the form "PREFIX=<Some_alpha_num_string>" , I save <Some_alpha_num_string> and set the Double to some default (say 0.0 )
  3. Otherwise I do nothing

So I guess the regex for #1 and #2 would be PREFIX[\\(]?[-]?[0-9]*\\.?[0-9]*[\\)]?=\\S* , but how do I use it to extract the two pieces?

BTW, I don't need to worry about the float being expressed in the scientific (" %e ") notation

UPDATE: A bit of clarification: PREFIX is a fixed string. So examples of valid strings would be:

  • PREFIX=fOo1234bar -- here I need to extract fOo1234bar
  • PREFIX(-1.23456)=SomeString -- here I need to extract -1.23456 and SomeString
  • PREFIX(0.20)=1A2b3C -- here I need to extract 0.20 and 1A2b3C

Given your regex, I'll assume that <signed_float> does not support scientific notation.

Regex for matching a float/double to listed in the javadoc for Double.valueOf(String) .

In that case, the regex would be:

PREFIX           Matching exact letters "PREFIX"
(?:              Start optional section
  \(              Matching exact character "("
  (               Start content capture #1 <signed_float>
    [+-]?          Matches optional sign
    (?:            Start choice section
      \d+\.?\d*     Matches <digits> ["."] [<digits>]
    |              Choice separator
      \.\d+         Matches "." <digits>
    )              End choice section
  )               End content capture #1
  \)              Matching exact character ")"
)?               End optional section
=                Matching exact character "="
(\S*)            Capture #2 <Some_alpha_num_string>

Or as a string:

"PREFIX(?:\\(([+-]?(?:\\d+\\.?\\d*|\\.\\d+))\\))?=(\\S*)"

Let's test it:

public static void main(String[] args) {
    test("PREFIX=fOo1234bar");
    test("PREFIX(-1.23456)=SomeString");
    test("PREFIX(0.20)=1A2b3C");
    test("sadfsahlhjladf");
}
private static void test(String text) {
    Pattern p = Pattern.compile("PREFIX(?:\\(([+-]?(?:\\d+\\.?\\d*|\\.\\d+))\\))?=(\\S*)");
    Matcher m = p.matcher(text);
    if (! m.matches())
        System.out.println("<do nothing>");
    else if (m.group(1) == null)
        System.out.println("'" + m.group(2) + "'");
    else
        System.out.println(Double.parseDouble(m.group(1)) + ", '" + m.group(2) + "'");
}

Output:

'fOo1234bar'
-1.23456, 'SomeString'
0.2, '1A2b3C'
<do nothing>

IF I understand what you're trying to do:

I would make an expression for the "PREFIX()=" case, and another for "PREFIX=". I would test with the first; if it fits, execute logic, and if it doesn't, try the next one. That gives you two simpler regex expressions to worry about. The Matcher that is returned from executing a check with a Pattern gives you the length of the string that matched, etc., so you can use substring on the original string to extract what you've found.

You don't say whether PREFIX is a fixed size; if not, then groups might help you separate PREFIX from the float vars. Just remember: it is REAL easy for the use of regular expressions to become harder than the problem you're trying to solve.

"I had problem and decided to solve it with regular expressions. Now I've got two problems".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM