简体   繁体   中英

Reassemble split string based on previous split in JAVA?

If I split a string, say like this:

List<String> words = Arrays.asList(input.split("\\\\s+"));

And I then wanted to modify those words in various way, then reassmble them using the same logic, assuming no word lengths have changed , is there a way to do that easily? Humor me in that there's a reason I'm doing this.

Note: I need to match all whitspace, not just spaces. Hence the regex.

ie:

"Beautiful Country" -> ["Beautiful", "Country"] -> ["BEAUTIFUL", "COUNTRY"] -> "BEAUTIFUL COUNTRY"

If you use String.split , there is no way to be sure that the reassembled strings will be the same as the original ones.

In general (and in your case) there is no way to capture what the actual separators used were. In your example, "\\\\s+" will match one or more whitespace characters, but you don't know which characters were used, or how many there were.

When you use split , the information about the separators is lost. Period.

(On the other hand, if you don't care that the reassembled string may be a different length or may have different separators to the original, use the Joiner class ...)

Assuming you are have a limit on how many words you can expect, you could try writing a regular expression like

(\S+)(\s+)?(\S+)?(\s+)?(\S+)?

(for the case in which you expect up to three words). You could then use the Matcher API methods groupCount(), group(n) to pull the individual words (the odd groups) or whitespace separators (the even groups >0), do what you needed with the words, and re-assemble them once again...

I tried this:

import java.util.*;
import java.util.stream.*;
public class StringSplits {
    private static List<String> whitespaceWords = new ArrayList<>();
    public static void main(String [] args) {
        String input = "What a Wonderful World! ...";
        List<String> words = processInput(input);
        // First transformation: ["What", "a", "Wonderful", "World!", "..."]
        String first = words.stream()
                             .collect(Collectors.joining("\", \"", "[\"", "\"]"));
        System.out.println(first);
        // Second transformation: ["WHAT", "A", "WONDERFUL", "WORLD!", "..."]
        String second = words.stream()
                              .map(String::toUpperCase)
                              .collect(Collectors.joining("\", \"", "[\"", "\"]"));
        System.out.println(second);
        // Final transformation: WHAT A WONDERFUL WORLD! ...
        String last = IntStream.range(0, words.size())
                                .mapToObj(i -> words.get(i) + whitespaceWords.get(i))
                                .map(String::toUpperCase)
                                .collect(Collectors.joining());
        System.out.println(last);
    }

    /*
     * Accepts input string of words containing character words and
     * whitespace(s) (as defined in the method Character#isWhitespce).
     * Processes and returns only the character strings. Stores the
     * whitespace 'words' (a single or multiple whitespaces) in a List<String>.
     * NOTE: This method uses String concatenation in a loop. For processing
     * large inputs consider using a StringBuilder.
     */
    private static List<String> processInput(String input) {
        List<String> words = new ArrayList<>();
        String word = "";
        String whitespaceWord = "";
        boolean wordFlag = true;
        for (char c : input.toCharArray()) {
            if (! Character.isWhitespace(c)) {
                if (! wordFlag) {
                    wordFlag = true;
                    whitespaceWords.add(whitespaceWord);
                    word = whitespaceWord = "";
                }
                word = word + String.valueOf(c);
            }   
            else {
                if (wordFlag) {
                    wordFlag = false;
                    words.add(word);
                    word = whitespaceWord = "";
                }
                whitespaceWord = whitespaceWord + String.valueOf(c);
            }
        } // end-for
        whitespaceWords.add(whitespaceWord);    
        if (! word.isEmpty()) {
            words.add(word);
        }
        return words;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM