简体   繁体   中英

Removing whitespaces at the beginning of the string with Regex gives null Java

I would like to get groups from a string that is loaded from txt file. This file looks something like this (notice the space at the beginning of file):

as431431af,87546,3214| 5a341fafaf,3365,54465      | 6adrT43   ,  5678  ,            5655

First part of string until first comma can be digits and letter, second part of string are only digits and third are also only digits. After | its all repeating.

First, I load txt file into string: String readFile3 = readFromTxtFile("/resources/file.txt");

Then I remove all whitespaces with regex:

String no_whitespace = readFile3.replaceAll("\\s+", "");

After that i try to get groups:

Pattern p = Pattern.compile("[a-zA-Z0-9]*,\\d*,\\d*", Pattern.MULTILINE);
Matcher m = p.matcher(ue_No_whitespace);
int lastMatchPos = 0;
while (m.find()) {
    System.out.println(m.group());
    lastMatchPos = m.end();
}
if (lastMatchPos != ue_No_whitespace.length())
   System.out.println("Invalid string!");

Now I would like, for each group remove "," and add every value to its variable, but I am getting this groups: (notice this NULL)

nullas431431af,87546,3214
5a341fafaf,3365,54465
6adrT43,5678,5655

What am i doing wrong? Even when i physicaly remove space from the beginning of the txt file, same result occurs. Is there any easier way to get groups in this string with regex and add each string part, before ",", to its variable?

You can split with |enclosed with optional whitespaces and then split the obtained items with , enclosed with optional whitespaces:

String str = "as431431af,87546,3214| 5a341fafaf,3365,54465      | 6adrT43   ,  5678  ,            5655";
String[] items = str.split("\\s*\\|\\s*");
List<String[]> res = new ArrayList<>();
for(String i : items) {
    String[] parts = i.split("\\s*,\\s*");
    res.add(parts);
    System.out.println(parts[0] + " - " + parts[1] + " - " + parts[2]);
}

See the Java demo printing

as431431af - 87546 - 3214
5a341fafaf - 3365 - 54465
6adrT43 - 5678 - 5655

The results are in the res list.

Note that

  • \s* - matches zero or more whitespaces
  • \| - matches a pipe char

The pattern that you tried only has optional quantifiers * which could also match only comma's.

You also don't need Pattern.MULTILINE as there are no anchors in the pattern.


You can use 3 capture groups and use + as the quantifier to match at least 1 or more occurrence, and after each part either match a pipe | or assert the end of the string $

([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\||$)

Regex demo | Java demo

For example

String readFile3 = "as431431af,87546,3214| 5a341fafaf,3365,54465      | 6adrT43   ,  5678  ,            5655";
String no_whitespace = readFile3.replaceAll("\\s+", "");
Pattern p = Pattern.compile("([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\\||$)");
Matcher matcher = p.matcher(no_whitespace);
while (matcher.find()) {
    for (int i = 1; i <= matcher.groupCount(); i++) {
        System.out.println(matcher.group(i));
    }
    System.out.println("--------------------------------");
}

Output

as431431af
87546
3214
--------------------------------
5a341fafaf
3365
54465
--------------------------------
6adrT43
5678
5655
--------------------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM