簡體   English   中英

Java - 將字符串拆分為具有字符限制的句子

[英]Java - Split String into sentences with character limitation

我想將文本拆分成句子(由.或 BreakIterator 拆分)。 但是:每個句子不得超過 100 個字符。

例子:

Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores.

To:(3個要素,不打斷一個詞,而是一個句子)

" Lorem ipsum dolor sit. ",
" Amet consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
  ut labore et dolore magna",
" aliquyam erat, sed diam voluptua. At vero eos et accusam
  et justo duo dolores. "

我怎樣才能正確地做到這一點?

可能有更好的方法來做到這一點,但它是這樣的:

public static void main(String... args) {

    String originalString = "Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore "
            + "et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores.";


    String[] s1 = originalString.split("\\.");
    List<String> list = new ArrayList<String>();

    for (String s : s1)
        if (s.length() > 100)
            list.addAll(Arrays.asList(s.split("(?<=\\G.{100})")));
        else
            list.add(s);

    System.out.println(list);
}

“split string in size”正則表達式來自這個 SO question 您可能可以整合兩個正則表達式,但我不確定這是否是一個明智的主意(:

如果正則表達式不在 Andrond 中運行( \\G運算符在任何地方都無法識別),請嘗試鏈接到根據字符串大小拆分字符串的其他解決方案

在這種情況下,正則表達式不會對您有很大幫助。

我會使用空格或. 然后開始連接。 像這樣的東西:

偽代碼

words = text.split("[\s\.]");
lines = new List();
while ( words.length() > 0 ) {

  String line = new String();
  while ( line.length() + words.get(0).length() < 100 ) {
    line += words.get(0);
    words.remove(words.get(0));
  }

  lines.add(line);

}

已解決(感謝 Macarse 的啟發):

String[] words = text.split("(?=[\\s\\.])");
ArrayList<String> array = new ArrayList<String>();
int i = 0;
while (words.length > i) {
    String line = "";
    while ( words.length > i && line.length() + words[i].length() < 100 ) {
        line += words[i];
        i++;
    }
    array.add(line);
}

按照之前的解決方案,我很快陷入了一個無限循環的問題,當每個單詞可能超過限制時(非常不可能,但不幸的是我的環境非常受限)。 所以,我為這個邊緣情況添加了一個修復(有點)(我認為)。

import java.util.*;

public class Main
{
    public static void main(String[] args) {
        sentenceToLines("In which of the following, a person is constantly followed/chased by another person or group of several people?", 15);
    }

    private static ArrayList<String> sentenceToLines(String s, int limit) {
        String[] words = s.split("(?=[\\s\\.])");
        ArrayList<String> wordList =  new ArrayList<String>(Arrays.asList(words));
        ArrayList<String> array = new ArrayList<String>();
        int i = 0, temp;
        String word, line;
        while (i < wordList.size()) {
            line = "";
            temp = i;
            // split the long words to the size of the limit
            while(wordList.get(i).length() > limit) {
                word = wordList.get(i);
                wordList.add(i++, word.substring(0, limit));
                wordList.add(i, word.substring(limit));
                wordList.remove(i+1);
            }
            i = temp;
            // continue making lines with newly split words
            while ( i < wordList.size() && line.length() + wordList.get(i).length() <= limit ) {
                line += wordList.get(i);
                i++;
            }
            System.out.println(line.trim());
            array.add(line.trim());
        }
        return array;
    }
    
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM