I want to split every sentence from a document and store each sentence in different arrays. Each array element is the word of the sentences. But i cant get far from this.
int count =0,len=0;
String sentence[];
String words[][];
sentence = name.split("\\.");
count = sentence.length;
System.out.print("total sentence: " );
System.out.println(count);
int h;
words = new String[count][];
for (h = 0; h < count; h++) {
String tmp[] = sentence[h].split(" ");
words[h] = tmp;
len = len + words[h].length;
System.out.println("total words: " );
System.out.print(len);
temp = sentence[h].split(delimiter);
for(int i = 0; i < temp.length; i++) {
System.out.print(len);
System.out.println(temp[i]);
len++;
}
}
I can't understand your code, but here's how to achieve your stated intention with just 3 lines:
String document; // read from somewhere
List<List<String>> words = new ArrayList<>();
for (String sentence : document.split("[.?!]\\s*"))
words.add(Arrays.asList(sentence.split("[ ,;:]+")));
If you want to convert the Lists
to arrays, use List.asArray()
, but I wouldn't recommend it. Lists are far easier to deal with than arrays. For one, they expand automatically (one reason why the above code is so dense).
Addendum: (most) characters don't need escaping inside a character class.
It seems like your input string is stored in main
. I do not understand what the inner for
loop is supposed to do: it prints len
repeatedly, but does not update it!
String sentences[];
String words[][];
// End punctuation marks are ['.', '?', '!']
sentences = name.split("[\\.\\?\\!]");
System.out.println("num of sentences: " + sentences.length);
// Allocate stogage for (sentences.length) new arrays of strings
words = new String[sentences.length][];
// For each sentence
for (int h = 0; h < sentences.length; h++) {
// Remove spaces from beginning and end of sentence (to avoid 0-length words)
// split by any white space character sequence (caution if using Unicode!)
words[h] = sentences[h].trim().split("\\s+");
// Print out length of sentence.
System.out.println("words (in sentence " + (h+1) + "): " + words[h].length);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.