简体   繁体   中英

Using the Scanner delimiter in Java, how do I keep the String that I am using as the delimiter?

My program reads a story from a file sentence by sentence, using punctuation as the delimiter. It stores the sentences in an ArrayList and then shuffles the ArrayList and prints it creating a different story every time you run the program. My problem is using the delimiter gets rid of the punctuation from the new story, is there a way I can still use the delimiter but keep the String as part of what I am reading?

You can use Scanner's default white-space delimiter to scan through your file's content, then use pattern/matcher to find the position of your punctuation delimiter within each scanner token.

Here's an example:

final List<String> sentences = new ArrayList();
final Scanner scanner = new Scanner(new File("story.txt"));
final Pattern pattern = Pattern.compile("[.!?]");

StringBuilder sb = new StringBuilder();

// default white space delimiter
while (scanner.hasNext()) {
    String token = scanner.next().trim();

    // look for pattern in current token
    Matcher matcher = pattern.matcher(token);
    if (matcher.find()) {

        // get end position of match
        int index = matcher.end();

        // add to sentence the substring from beginning of token to the end match position
        sb.append(token.substring(0, index));

        // build and add your sentence
        sentences.add(sb.toString().trim());

        // start new sentence
        sb = new StringBuilder(token.substring(index));

    } else {
        // no punctuation match, add token to sentence
        sb.append(token);
    }

    // add space to sentence
    sb.append(" ");
}

Collections.shuffle(sentences);
for (String sentence : sentences) {
    System.out.println(sentence);
}

You can always scan a single character at a time if the language of your story doesn't always use white-space (eg. Chinese).

Hope this helps!

I had this same problem and landed here but the previous answer did not fit my needs. After some trial and error, this is what I came up with so I returned to share it in case it helps someone, later:

General Solution

Use Scanner#findInLine (or even Scanner#findWithinHorizon ) to capture delimiters off the input stream, along the way:

/* This method does not close the given scanner. That must happen, elsewhere (typically in a loop that calls this) */
public String getNextPattern(Scanner s, String pattern) {
    s.useDelimiter(pattern);
    if(!s.hasNext()) {
        return null;
    }
    s.next();
    return s.findInLine(pattern);
}

Explanation

What's noteworthy here is that the Scanner actually leaves delimiters on the input stream. So all this method is doing is:

  1. Setting the delimiter to what we want to match in the stream
  2. Advancing past the next token (ie discarding the input that does not match the delimiter)
  3. Pulling the delimiter off the stream. Given the way Scanner works, we know the delimiter will be the next text on the stream.

This solution is one approach for extracting occurrences of any regex pattern out of a stream or file.


Solution to Your Specific Problem

In my case, I discarded the tokens. In your case, those are sentences that you want to keep so you'd want to store those in your ArrayList as you go. Something along these lines would solve your specific problem:

// simplistic approach to handling whitespace
private static final String PUNCTUATION_PATTERN = "[.!?]\\s*";

// for example purposes, read from stdin and write to stdout
public void shuffleStory(InputStream input) {
    try(Scanner s = new Scanner(input)) {
        s.useDelimiter(PUNCTUATION_PATTERN);
        List<String> sentences = new ArrayList<>();
        while(s.hasNext()) {
            sentences.add(s.next() + s.findInLine(PUNCTUATION_PATTERN).trim());
        }
        Collections.shuffle(sentences);
        System.out.println(String.join(" ", sentences));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM