简体   繁体   English

大字符串在java中分割成具有最大长度的行

[英]Large string split into lines with maximum length in java

String input = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";

//text copied from http://www.nationalgeographic.com/community/terms/

I want to split this large string into lines and the lines should not content more than MAX_LINE_LENGTH characters in each line. 我想将这个大字符串拆分成行,并且每行中的行不应超过MAX_LINE_LENGTH个字符。

What I tried so far 到目前为止我尝试了什么

int MAX_LINE_LENGTH = 20;    
System.out.print(Arrays.toString(input.split("(?<=\\G.{MAX_LINE_LENGTH})")));
//maximum length of line 20 characters

Output : 输出:

[THESE TERMS AND COND, ITIONS OF SERVICE (t, he Terms) ARE A LEGA, L AND B ...

It causes breaking of words . 它导致文字破裂 I don't want this. 我不想要这个。 Instead of I want to get output like this: 而不是我想得到这样的输出:

[THESE TERMS AND , CONDITIONS OF , SERVICE (the Terms) , ARE A LEGAL AND B ...

One more condition added : If a word length is greater than MAX_LINE_LENGTH then the word should get split. 还添加了一个条件:如果字长大于MAX_LINE_LENGTH,那么该单词应该被拆分。

And solution should be without helping of external jars. 解决方案应该没有外部罐子的帮助。

Just iterate through the string word by word and break whenever a word passes the limit. 只需逐字遍历字符串,并在单词超出限制时中断。

public String addLinebreaks(String input, int maxLineLength) {
    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        if (lineLen + word.length() > maxLineLength) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

I just typed that in freehand, you may have to push and prod a bit to make it compile. 我只是徒手打字,你可能需要推动并刺激一下才能编译。

Bug: if a word in the input is longer than maxLineLength it will be appended to the current line instead of on a too-long line of its own. 错误:如果输入中的单词长于maxLineLength ,它将被附加到当前行而不是它自己的太长行。 I assume your line length is something like 80 or 120 characters, in which case this is unlikely to be a problem. 我假设您的行长度类似于80或120个字符,在这种情况下,这不太可能是一个问题。

Best : use Apache Commons Lang : 最佳:使用Apache Commons Lang:

org.apache.commons.lang.WordUtils org.apache.commons.lang.WordUtils

/**
 * <p>Wraps a single line of text, identifying words by <code>' '</code>.</p>
 * 
 * <p>New lines will be separated by the system property line separator.
 * Very long words, such as URLs will <i>not</i> be wrapped.</p>
 * 
 * <p>Leading spaces on a new line are stripped.
 * Trailing spaces are not stripped.</p>
 *
 * <pre>
 * WordUtils.wrap(null, *) = null
 * WordUtils.wrap("", *) = ""
 * </pre>
 *
 * @param str  the String to be word wrapped, may be null
 * @param wrapLength  the column to wrap the words at, less than 1 is treated as 1
 * @return a line with newlines inserted, <code>null</code> if null input
 */
public static String wrap(String str, int wrapLength) {
    return wrap(str, wrapLength, null, false);
}

Thanks Barend Garvelink for your answer. 感谢Barend Garvelink的回答。 I have modified the above code to fix the Bug: "if a word in the input is longer than maxCharInLine" 我修改了上面的代码来修复Bug:“如果输入中的单词长于maxCharInLine”

public String[] splitIntoLine(String input, int maxCharInLine){

    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        while(word.length() > maxCharInLine){
            output.append(word.substring(0, maxCharInLine-lineLen) + "\n");
            word = word.substring(maxCharInLine-lineLen);
            lineLen = 0;
        }

        if (lineLen + word.length() > maxCharInLine) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word + " ");

        lineLen += word.length() + 1;
    }
    // output.split();
    // return output.toString();
    return output.toString().split("\n");
}

You can use WordUtils.wrap method of Apache Commans Lang 您可以使用Apache Commans Lang的WordUtils.wrap方法

 import java.util.*;
 import org.apache.commons.lang3.text.WordUtils;
 public class test3 {


public static void main(String[] args) {

    String S = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
    String F = WordUtils.wrap(S, 20);
    String[] F1 =  F.split(System.lineSeparator());
    System.out.println(Arrays.toString(F1));

}}

Output 产量

   [THESE TERMS AND, CONDITIONS OF, SERVICE (the Terms), ARE A LEGAL AND, BINDING AGREEMENT, BETWEEN YOU AND, NATIONAL GEOGRAPHIC, governing your use, of this site,, www.nationalgeographic.com,, which includes but, is not limited to, products, software, and services offered, by way of the, website such as the, Video Player,, Uploader, and other, applications that, link to these Terms, (the Site). Please, review the Terms, fully before you, continue to use the, Site. By using the, Site, you agree to, be bound by the, Terms. You shall, also be subject to, any additional terms, posted with respect, to individual, sections of the, Site. Please review, our Privacy Policy,, which also governs, your use of the, Site, to understand, our practices. If, you do not agree,, please discontinue, using the Site., National Geographic, reserves the right, to change the Terms, at any time without, prior notice. Your, continued access or, use of the Site, after such changes, indicates your, acceptance of the, Terms as modified., It is your, responsibility to, review the Terms, regularly. The Terms, were last updated on, 18 July 2011.]

Starting from @Barend 's suggestion, following is my final version with minor modifications : 从@Barend的建议开始,以下是我的最终版本,稍作修改:

private static final char NEWLINE = '\n';
private static final String SPACE_SEPARATOR = " ";
//if text has \n, \r or \t symbols it's better to split by \s+
private static final String SPLIT_REGEXP= "\\s+";

public static String breakLines(String input, int maxLineLength) {
    String[] tokens = input.split(SPLIT_REGEXP);
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    for (int i = 0; i < tokens.length; i++) {
        String word = tokens[i];

        if (lineLen + (SPACE_SEPARATOR + word).length() > maxLineLength) {
            if (i > 0) {
                output.append(NEWLINE);
            }
            lineLen = 0;
        }
        if (i < tokens.length - 1 && (lineLen + (word + SPACE_SEPARATOR).length() + tokens[i + 1].length() <=
                maxLineLength)) {
            word += SPACE_SEPARATOR;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

System.out.println(breakLines("THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A     LEGAL AND BINDING " +
                "AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing     your use of this site, " +
            "www.nationalgeographic.com, which includes but is not limited to products, " +
            "software and services offered by way of the website such as the Video Player.", 20));

Outputs : 产出:

THESE TERMS AND
CONDITIONS OF
SERVICE (the Terms)
ARE A LEGAL AND
BINDING AGREEMENT
BETWEEN YOU AND
NATIONAL GEOGRAPHIC
governing your use
of this site,
www.nationalgeographic.com,
which includes but
is not limited to
products, software
and services 
offered by way of
the website such as
the Video Player.

I have recently written a few methods to do this that, if no whitespace characters are present in one of the lines, opts for splitting on other non-alphanumeric characters prior to resorting to a mid-word split. 我最近编写了一些方法来执行此操作,如果其中一行中没有空格字符,则在求助于中间词拆分之前选择拆分其他非字母数字字符。

Here is how it turned out for me: 以下是我的结果:

(Uses the lastIndexOfRegex() methods I posted here .) (使用我在这里发布的lastIndexOfRegex()方法。)

/**
 * Indicates that a String search operation yielded no results.
 */
public static final int NOT_FOUND = -1;



/**
 * Version of lastIndexOf that uses regular expressions for searching.
 * By Tomer Godinger.
 * 
 * @param str String in which to search for the pattern.
 * @param toFind Pattern to locate.
 * @return The index of the requested pattern, if found; NOT_FOUND (-1) otherwise.
 */
public static int lastIndexOfRegex(String str, String toFind)
{
    Pattern pattern = Pattern.compile(toFind);
    Matcher matcher = pattern.matcher(str);

    // Default to the NOT_FOUND constant
    int lastIndex = NOT_FOUND;

    // Search for the given pattern
    while (matcher.find())
    {
        lastIndex = matcher.start();
    }

    return lastIndex;
}

/**
 * Finds the last index of the given regular expression pattern in the given string,
 * starting from the given index (and conceptually going backwards).
 * By Tomer Godinger.
 * 
 * @param str String in which to search for the pattern.
 * @param toFind Pattern to locate.
 * @param fromIndex Maximum allowed index.
 * @return The index of the requested pattern, if found; NOT_FOUND (-1) otherwise.
 */
public static int lastIndexOfRegex(String str, String toFind, int fromIndex)
{
    // Limit the search by searching on a suitable substring
    return lastIndexOfRegex(str.substring(0, fromIndex), toFind);
}

/**
 * Breaks the given string into lines as best possible, each of which no longer than
 * <code>maxLength</code> characters.
 * By Tomer Godinger.
 * 
 * @param str The string to break into lines.
 * @param maxLength Maximum length of each line.
 * @param newLineString The string to use for line breaking.
 * @return The resulting multi-line string.
 */
public static String breakStringToLines(String str, int maxLength, String newLineString)
{
    StringBuilder result = new StringBuilder();
    while (str.length() > maxLength)
    {
        // Attempt to break on whitespace first,
        int breakingIndex = lastIndexOfRegex(str, "\\s", maxLength);

        // Then on other non-alphanumeric characters,
        if (breakingIndex == NOT_FOUND) breakingIndex = lastIndexOfRegex(str, "[^a-zA-Z0-9]", maxLength);

        // And if all else fails, break in the middle of the word
        if (breakingIndex == NOT_FOUND) breakingIndex = maxLength;

        // Append each prepared line to the builder
        result.append(str.substring(0, breakingIndex + 1));
        result.append(newLineString);

        // And start the next line
        str = str.substring(breakingIndex + 1);
    }

    // Check if there are any residual characters left
    if (str.length() > 0)
    {
        result.append(str);
    }

    // Return the resulting string
    return result.toString();
}

My version(The previous were not working) 我的版本(之前没有用)

public static List<String> breakSentenceSmart(String text, int maxWidth) {

    StringTokenizer stringTokenizer = new StringTokenizer(text, " ");
    List<String> lines = new ArrayList<String>();
    StringBuilder currLine = new StringBuilder();
    while (stringTokenizer.hasMoreTokens()) {
        String word = stringTokenizer.nextToken();

        boolean wordPut=false;
        while (!wordPut) {
            if(currLine.length()+word.length()==maxWidth) { //exactly fits -> dont add the space
                currLine.append(word);
                wordPut=true;
            }
            else if(currLine.length()+word.length()<=maxWidth) { //whole word can be put
                if(stringTokenizer.hasMoreTokens()) {
                    currLine.append(word + " ");
                }else{
                    currLine.append(word);
                }
                wordPut=true;
            }else{
                if(word.length()>maxWidth) {
                    int lineLengthLeft = maxWidth - currLine.length();
                    String firstWordPart = word.substring(0, lineLengthLeft);
                    currLine.append(firstWordPart);
                    //lines.add(currLine.toString());
                    word = word.substring(lineLengthLeft);
                    //currLine = new StringBuilder();
                }
                lines.add(currLine.toString());
                currLine = new StringBuilder();
            }

        }
        //
    }
    if(currLine.length()>0) { //add whats left
        lines.add(currLine.toString());
    }
    return lines;
}

Since Java 8 you can also use Streams to tackle such problems. Java 8开始,您也可以使用Streams来解决此类问题。

Following you can find a full example that utilizes Reduction using the .collect() method 下面你可以找到一个使用.collect()方法利用Reduction的完整示例

I think this one should be shorter than other non-3rd-party solutions. 我认为这个应该比其他非第三方解决方案更短。

private static String multiLine(String longString, String splitter, int maxLength) {
    return Arrays.stream(longString.split(splitter))
            .collect(
                ArrayList<String>::new,     
                (l, s) -> {
                    Function<ArrayList<String>, Integer> id = list -> list.size() - 1;
                    if(l.size() == 0 || (l.get(id.apply(l)).length() != 0 && l.get(id.apply(l)).length() + s.length() >= maxLength)) l.add("");
                    l.set(id.apply(l), l.get(id.apply(l)) + (l.get(id.apply(l)).length() == 0 ? "" : splitter) + s);
                },
                (l1, l2) -> l1.addAll(l2))
            .stream().reduce((s1, s2) -> s1 + "\n" + s2).get();
}

public static void main(String[] args) {
    String longString = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
    String SPLITTER = " ";
    int MAX_LENGTH = 20;
    System.out.println(multiLine(longString, SPLITTER, MAX_LENGTH));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM