简体   繁体   中英

Split and add the string based on length

I have a paragraph as input string. I'm trying to split the paragraph into array of sentences where each element contains exact sentence(s) is not more-than 250 characters.

I tried split the string based on deliminator (as .) . Converted all the string into list. Using StringBuilder , I'm trying to append the String depending on the length (250 Char).

    List<String> list = new ArrayList<String>();

    String text = "Perhaps far exposed age effects. Now distrusts you her delivered applauded affection out sincerity. As tolerably recommend shameless unfeeling he objection consisted. She although cheerful perceive screened throwing met not eat distance. Viewing hastily or written dearest elderly up weather it as. So direction so sweetness or extremity at daughters. Provided put unpacked now but bringing. Unpleasant astonished an diminution up partiality. Noisy an their of meant. Death means up civil do an offer wound of. Called square an in afraid direct. Resolution diminution conviction so mr at unpleasing simplicity no. No it as breakfast up conveying earnestly immediate principle. Him son disposed produced humoured overcame she bachelor improved. Studied however out wishing but inhabit fortune windows. ";

    Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)",
            Pattern.MULTILINE | Pattern.COMMENTS);

    Matcher reMatcher = re.matcher(text);
    while (reMatcher.find()) {
        list.add(reMatcher.group());
    }
    String textDelimted[] = new String[list.size()];
    textDelimted = list.toArray(textDelimted);

    StringBuilder stringB = new StringBuilder(100);

    for (int i = 0; i < textDelimted.length; i++) {
        while (stringB.length() + textDelimted[i].length() < 250)
            stringB.append(textDelimted[i]);

        System.out.println("!#@#$%" +stringB.toString());
    }
}

Expected result:

[0] : Perhaps far exposed age effects. Now distrusts you her delivered applauded affection out sincerity. As tolerably recommend shameless unfeeling he objection consisted. She although cheerful perceive screened throwing met not eat distance.

[1] : Viewing hastily or written dearest elderly up weather it as. So direction so sweetness or extremity at daughters. Provided put unpacked now but bringing. Unpleasant astonished an diminution up partiality. Noisy an their of meant.

[2] : Death means up civil do an offer wound of. Called square an in afraid direct. Resolution diminution conviction so mr at unpleasing simplicity no. No it as breakfast up conveying earnestly immediate principle.

[3] Him son disposed produced humoured overcame she bachelor improved. Studied however out wishing but inhabit fortune windows.

Your question is unclear, please try to reword to make obvious exactly what your problem is.

That being said, I am assuming "I tried split the string based on deliminator (as .) . Converted all the string into list" means that you want to split a String whenever a "." appears, and convert to a List<String> . That can be done as follows:

String input = "hello.world.with.delimiters";
String[] words = input.split("\\.");  // String[] with contents {"hello", "world", "with", "delimiters"}
List<String> list = Arrays.asList(words);  // Identical contents, just in a List<String>


// if you want to append to a StringBuilder based on length
StringBuilder sb = new StringBuilder();
for (String s : list) {
    if (someLengthCondition(s.length())) sb.append(list);
}

Of course, your implementation of someLengthCondition() will depend on what you want. I can't provide one as it is hard to understand what you are trying to do.

I think you need to modify your loop just a little bit. My results match.

import java.util.List;
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MyClass {
    public static void main(String args[]) {

        List<String> list = new ArrayList<String>();

        String text = "Perhaps far exposed age effects. Now distrusts you her delivered applauded affection out sincerity. As tolerably recommend shameless unfeeling he objection consisted. She although cheerful perceive screened throwing met not eat distance. Viewing hastily or written dearest elderly up weather it as. So direction so sweetness or extremity at daughters. Provided put unpacked now but bringing. Unpleasant astonished an diminution up partiality. Noisy an their of meant. Death means up civil do an offer wound of. Called square an in afraid direct. Resolution diminution conviction so mr at unpleasing simplicity no. No it as breakfast up conveying earnestly immediate principle. Him son disposed produced humoured overcame she bachelor improved. Studied however out wishing but inhabit fortune windows. ";

        Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)",
                Pattern.MULTILINE | Pattern.COMMENTS);

        Matcher reMatcher = re.matcher(text);
        while (reMatcher.find()) {
            list.add(reMatcher.group());
        }
        String textDelimted[] = new String[list.size()];
        textDelimted = list.toArray(textDelimted);

        StringBuilder stringB = new StringBuilder(300);

        for (int i = 0; i < textDelimted.length; i++) {
            if(stringB.length() + textDelimted[i].length() < 250) {
                stringB.append(textDelimted[i]);
            } else {
                System.out.println("!#@#$%" +stringB.toString());
                stringB = new StringBuilder(300);
                stringB.append(textDelimted[i]);
            }

        }
        System.out.println("!#@#$%" +stringB.toString());
    }
}

Replace the println with this code to get a list of results:

ArrayList<String> arrlist = new ArrayList<String>(5);
..
arrlist.add(stringB.toString());
..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM