My program reads a story from a file sentence by sentence, using punctuation as the delimiter. It stores the sentences in an ArrayList and then shuffles the ArrayList and prints it creating a different story every time you run the program. My problem is using the delimiter gets rid of the punctuation from the new story, is there a way I can still use the delimiter but keep the String as part of what I am reading?
You can use Scanner's default white-space delimiter to scan through your file's content, then use pattern/matcher to find the position of your punctuation delimiter within each scanner token.
Here's an example:
final List<String> sentences = new ArrayList();
final Scanner scanner = new Scanner(new File("story.txt"));
final Pattern pattern = Pattern.compile("[.!?]");
StringBuilder sb = new StringBuilder();
// default white space delimiter
while (scanner.hasNext()) {
String token = scanner.next().trim();
// look for pattern in current token
Matcher matcher = pattern.matcher(token);
if (matcher.find()) {
// get end position of match
int index = matcher.end();
// add to sentence the substring from beginning of token to the end match position
sb.append(token.substring(0, index));
// build and add your sentence
sentences.add(sb.toString().trim());
// start new sentence
sb = new StringBuilder(token.substring(index));
} else {
// no punctuation match, add token to sentence
sb.append(token);
}
// add space to sentence
sb.append(" ");
}
Collections.shuffle(sentences);
for (String sentence : sentences) {
System.out.println(sentence);
}
You can always scan a single character at a time if the language of your story doesn't always use white-space (eg. Chinese).
Hope this helps!
I had this same problem and landed here but the previous answer did not fit my needs. After some trial and error, this is what I came up with so I returned to share it in case it helps someone, later:
Use Scanner#findInLine (or even Scanner#findWithinHorizon ) to capture delimiters off the input stream, along the way:
/* This method does not close the given scanner. That must happen, elsewhere (typically in a loop that calls this) */
public String getNextPattern(Scanner s, String pattern) {
s.useDelimiter(pattern);
if(!s.hasNext()) {
return null;
}
s.next();
return s.findInLine(pattern);
}
What's noteworthy here is that the Scanner actually leaves delimiters on the input stream. So all this method is doing is:
This solution is one approach for extracting occurrences of any regex pattern out of a stream or file.
In my case, I discarded the tokens. In your case, those are sentences that you want to keep so you'd want to store those in your ArrayList as you go. Something along these lines would solve your specific problem:
// simplistic approach to handling whitespace
private static final String PUNCTUATION_PATTERN = "[.!?]\\s*";
// for example purposes, read from stdin and write to stdout
public void shuffleStory(InputStream input) {
try(Scanner s = new Scanner(input)) {
s.useDelimiter(PUNCTUATION_PATTERN);
List<String> sentences = new ArrayList<>();
while(s.hasNext()) {
sentences.add(s.next() + s.findInLine(PUNCTUATION_PATTERN).trim());
}
Collections.shuffle(sentences);
System.out.println(String.join(" ", sentences));
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.