简体   繁体   中英

Java8: Looking for a better way of parsing a text of “key: value” lines

I have a String of text lines.
Some of the lines have a format of "key: value". Others should be ignored.
I have a fixed (pre-defined) list of keys that I needs to extract values for and put into a HashMap.
So, I'm doing something like this:

BufferedReader reader = new BufferedReader(new StringReader(memoText));

reader.lines().forEach(line->{
    if(line.startsWith("prefix1")){
        // Some code is required here to get the value1
    }  
    else if(line.startsWith("prefix2")){
        // Some code is required here to get the value2
    }  
    ...
}

Is there a better way of implementing the parsing in Java 8?

As per you current problem statement. You can try below code which..

  • Reads a file and creates a stream out of it
  • Compiles each string using regex
  • Filter out all the strings which do not match with pattern
  • Read the matching groups to Map

You may want to change it as per your needs:

import static java.util.stream.Collectors.toMap;
//skipped
Pattern pattern = Pattern.compile("([a-zA-Z]+)\\s*:\\s*(.*)");
try (Stream<String> stream = Files.lines(Paths.get("<PATH_TO_FILE>"))) {
    Map<String, String> results =
            stream.map(pattern::matcher)
                    .filter(Matcher::find)
                    .collect(toMap(a -> a.group(1), a -> a.group(2)));
}

Let me know, if this is not what you are looking for

// define your fixed keys in a list
List<String> keys = Arrays.asList("key1", "key2");
reader.lines()
      // use filter instead of if-else
      .filter(line -> line.indexOf(":")>-1 && keys.contains(line.substring(0, line.indexOf(":"))))
      // collect in to a map
      .collect(Collectos.toMap(line -> {
          return line.substring(0, line.indexOf(":"));
      }, line -> {
          return line.substring(line.indexOf(":") + 1);
      }))

But you must make sure every line has the different key. Or it will throw java.lang.IllegalStateException: Duplicate key

You can certainly use split to do this, but for cases like this I think regexes are more flexible. Also note that, following your example, this is parsing from a string, so I've omitted exception handling for and closing of the BufferedReader .

Here's a Java 8 version:

static String memoText = "foo: fooValue\r\n" +
                         "otherKey: otherValue\r\n" +
                         "# something else like a comment line\r\n" +
                         "bar: barValue\r\n";

static Map<String, String> parseKeysValues(String memoText) {
    Pattern pattern = Pattern.compile("([a-zA-Z]+)\\s*:\\s*(.*)");
    Set<String> allowedKeys = new HashSet<>(Arrays.asList("foo", "bar"));
    return new BufferedReader(new StringReader(memoText)).lines()
        .map(pattern::matcher)
        .filter(Matcher::matches)
        .filter(m -> allowedKeys.contains(m.group(1)))
        .collect(Collectors.toMap(m -> m.group(1), m -> m.group(2)));
}

The idea is that, given a stream of lines, match them against a pattern with groups that will contain the key and value. Of course, you can adjust the pattern to match whatever characters are valid or keys and values, to trim whitespace, etc. Then, filter(Matcher::matches) lets through only the successful matches. At this point, regex group 1 is the key and group 2 is the value, so we can filter for only the allowed keys, and then put the results into a Map.

This will throw an exception if there are duplicate keys. To implement a different policy, add a third argument to toMap that will merge the new value with the existing one. For example, use (a, b) -> b to implement a last-one-wins policy.

In Java 9, this will get somewhat simpler:

static Map<String, String> parseKeysValues9(String memoText) {
    Set<String> allowedKeys = Set.of("foo", "bar");
    return new Scanner(memoText).findAll("(?m)^([a-zA-Z]+)\\s*:\\s*(.*)$")
        .filter(mr -> allowedKeys.contains(mr.group(1)))
        .collect(Collectors.toMap(mr -> mr.group(1), mr -> mr.group(2), (a, b) -> b));
}

Here, we initialize the set of allowed keys with the new Set.of static factory method. We also parse the input using Scanner instead of BufferedReader . The new findAll method will produce a stream of MatchResult containing all matches from the input. A small wrinkle is that we have to modify the pattern to deal with line endings, since we're not reading line by line anymore. By default, ^ and $ match the beginning and ending of the entire input. We insert the (?m) directive to enable MULTILINE mode so that ^ and $ match the beginning and ends of lines, respectively. Finally, as before, we filter by allowed keys, and then collect to a Map. This example shows the last-one-wins merging function as the third argument to toMap .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM