繁体   English   中英

将每个句子存储在java文档中的数组中?

[英]Storing each sentence in an array from a document in java?

我想从文档中拆分每个句子并将每个句子存储在不同的数组中。 每个数组元素是句子的单词。 但我不能远离这一点。

int count =0,len=0;
String sentence[];
String words[][];
sentence = name.split("\\.");
count = sentence.length;

System.out.print("total sentence: " );
System.out.println(count);
int h;  
words = new String[count][]; 

for (h = 0; h < count; h++) {
     String tmp[] = sentence[h].split(" ");
     words[h] = tmp;
     len = len + words[h].length;
     System.out.println("total words: " );
     System.out.print(len); 

     temp = sentence[h].split(delimiter);  

     for(int i = 0; i < temp.length; i++) {
        System.out.print(len);
        System.out.println(temp[i]);
        len++;
     }  
}

我无法理解您的代码,但这里是如何仅用 3 行代码来实现您的既定意图:

String document; // read from somewhere

List<List<String>> words = new ArrayList<>();
for (String sentence : document.split("[.?!]\\s*"))
    words.add(Arrays.asList(sentence.split("[ ,;:]+")));

如果要将Lists转换为数组,请使用List.asArray() ,但我不推荐它。 列表比数组更容易处理。 一方面,它们会自动扩展(上述代码如此密集的原因之一)。

附录:(大多数)字符不需要在字符类中转义。

您的输入字符串似乎存储在main 我不明白内部for循环应该做什么:它重复打印len ,但不更新它!

String sentences[];
String words[][];

// End punctuation marks are ['.', '?', '!']
sentences = name.split("[\\.\\?\\!]"); 

System.out.println("num of sentences: " + sentences.length);

// Allocate stogage for (sentences.length) new arrays of strings
words = new String[sentences.length][];

// For each sentence
for (int h = 0; h < sentences.length; h++) {
  // Remove spaces from beginning and end of sentence (to avoid 0-length words)
  // split by any white space character sequence (caution if using Unicode!)
  words[h] = sentences[h].trim().split("\\s+"); 

  // Print out length of sentence.
  System.out.println("words (in sentence " + (h+1) + "): " + words[h].length);
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM