简体   繁体   中英

Calculate number of word occurrences in Stream<String> with characters in front or behind

I'm searching large logfiles for specific words. I've found some basic solutions on this if the String contains white spaces. But what I need is to find all occurrences of a specific word that can be surrounded by any character.

eg looking for "hello": "abchello" returning 1 or "##hello123...@456hello8" returning 2

I could do that with basic for loops, but I want to use mostly streams (and perhaps parallel streams) for this due to the speed gain (going thru large files).

The following seems to find any version of "hello" but it stops at the first one and goes to the next line:

bufferReader = Files.newBufferedReader(Paths.get(file));
Long count = bufferReader != null ? bufferReader.lines().filter(l -> l.matches(".*hello.*")).count() : null;

Using org.apache.commons.lang3.StringUtils#countMatches:

bufferReader = Files.newBufferedReader(Paths.get(file));
Integer count = bufferReader != null ? bufferReader.lines().mapToInt(line -> StringUtils.countMatches(line, "hello")).sum() : null;

More ways to count matches: Occurrences of substring in a string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM