简体   繁体   中英

Creating regular expression for splitting

I'm currently studying informatics and know what a regex is (not in java so). I have an input like :

"String1  ( Nr = 323) String2 String3 (Nr  = 3)"

I wanted to split it by using:

split("[ ()=]");

because I think this would split all " ","(",")","=". Am i right? or do I need to put a + behind it? and if this is already right I could add a * so I can also split for something like "(("?

If this isn't the problem then my other question regarding regex in java is how can I check if my String only contains numbers.

I tried:

contains(".*\\d+.*")
matches(".*\\d+.*")

But I'm pretty sure one of them is working. So my problem should be with the splitting regex.

My original problem is that I get a NumberFormatException for my splitted String array at the index 2 which normally should be "323"?

Can I use my regex with a * ? like "[ ()=]*" ?

Thanks in advance

Yes, it will split on those characters but not produce the expected results. You need to use a quantifier with your character class. I recommend using + meaning "one or more" times.

String s = "String1  ( Nr = 323) String2 String3 (Nr  = 3)";
String[] parts = s.split("[ ()=]+");
System.out.println(Arrays.toString(parts));

Output

[String1, Nr, 323, String2, String3, Nr, 3]

A regex that is useful for splitting a string

  • must describe all separators between the strings you want to have
  • may not describe the empty string

You have non-space separators ()= surrounded by spaces between the strings you want to have. You could be generous and use

"[ ()=]+"   any mixture

or fiddly (requiring one of the ()= ) and do

"\\s*[()=]\\s*"

With this regex, a split of "foo (( bar" would give you three strings.

to make sure it is only *one+ of the trio ()= .

To check whether a string only contains numbers you'll have to define "numbers". There is a general confusion between "digit" and "number", and "number" might be signed or unsigend, integer or fraction,... Number s implies at least one space.

"\\s*(\\d+)(\\s+\\d+)*\\s*"

describes unsigned integers separated and surrounded by (optional) spaces. In this simple case, also

"[\\s\\d]+"

will do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM