繁体   English   中英

Java 使用单词列表分隔字符串

[英]Java Separate a String using a List of Words

如何使用预先给定的字符串列表来分隔字符串,并用空格分隔它们?

例如:

单词列表: words = {"hello", "how", "are", "you"}

我要分隔的字符串: text = "hellohowareyou"

public static String separateText(String text, List<String> words) {
    String new_text;

    for (String word : words) {
        if (text.startsWith(word)) {
            String suffix = text.substring(word.length());  //'suffix' is the 'text' without it's first word
            new_text += " " + word;  //add the first word of the 'string'
            separateString(suffix, words);
        }
    }
    
    return new_text;
}

new_text应该返回hello how are you

请注意,列表words的顺序可能不同,并且有更多单词,例如字典。

如果需要,我怎样才能进行这种递归?

这个解决方案非常简单,但它不是 memory 最优的,因为创建了许多新String

public static String separate(String str, Set<String> words) {
    for (String word : words)
        str = str.replace(word, word + ' ');

    return str.trim();
}

演示

Set<String> words = Set.of("hello", "how", "are", "you");
System.out.println(separate("wow hellohowareyouhellohowareyou", words));
// wow hello how are you hello how are you

另一个解决方案,使用StringBuilder并且从性能视图对我来说看起来更好。

public static String separate(String str, Set<String> words) {
    List<String> res = new LinkedList<>();
    StringBuilder buf = new StringBuilder();

    for (int i = 0; i < str.length(); i++) {
        buf.append(str.charAt(i));

        if (str.charAt(i) == ' ' || words.contains(buf.toString().trim())) {
            res.add(buf.toString().trim());
            buf.delete(0, buf.length());
        }
    }

    return String.join(" ", res);
}

这应该做你想做的

public static String separateText(String text, List<String> words){
        StringBuilder newTextBuilder = new StringBuilder();

        outerLoop:
        while(text.length() > 0){
            for(String word : words){
                if(text.startsWith(word)){
                    newTextBuilder.append(word + " ");
                    text = text.substring(word.length());
                    continue outerLoop;
                }
            }
        }

        return newTextBuilder.toString();
    }
}

如何使用预先给定的字符串列表来分隔字符串,并用空格分隔它们?

几乎你已经开始了。 检查剩余文本是否以列表中的任何单词开头,删除起始单词并保留后缀。

您已经完成了所有这些操作,但是您决定尝试递归地调用separateText ,而不是仅仅保留后缀并继续迭代。

这也是一种可能,但即使只是正常地在 while 循环中迭代直到后缀(或剩余的文本)为空就足够了。

    public String separateText(String text, List<String> words){

        String new_text = "";

        while (!text.isEmpty()) {
            for (String word : words) {
                if (text.startsWith(word)) {
                    // 'text' becomes previous 'text' without its first word
                    text = text.substring(word.length());  
                    new_text += " " + word;  // add the first word of the 'string'
                }
            }
        }

        return new_text;
    }

这是一个可能的递归解决方案。

这将涵盖当单词列表包含“hell”和“hello”时的用例,您决定是否使用该单词并且停止条件是新字符串中的所有单词是否都存在于单词数组中

public class main {

    public static String separateWords(String seed, List<String> dictionary, int index) {

        if (index == dictionary.size() - 1) {
            String[] words = Arrays.stream(seed.split(" ")).filter(word -> !dictionary.contains(word)).toArray(String[]::new);
            if (words.length == 0) return seed;
            else return "";
        }
        String word = dictionary.get(index);
        String current = seed.replaceFirst(word, word + " ");
        String withWord = separateWords(current, dictionary, index + 1);
        String withoutWord = separateWords(seed, dictionary, index + 1);
        if (withoutWord.length() > withWord.length()) return withoutWord;
        return withWord;


    }

    public static void main(String[] args) {
        List<String> words = List.of(new String[]{"hello", "how", "are", "you"});
        String text = "hellohowareyou";
        String result = separateWords(text,words,0);
        System.out.printf(result);
    }
}

对于递归方法,请尝试以下操作:

public static String separateText(String text, List<String> words){
    return separateText(text, words, new StringBuilder());
}

public static String separateText(String text, List<String> words, StringBuilder result){

    for(String word : words){
        if (text.startsWith(word)){
           result.append(word).append(" ");
           text = text.substring(word.length());
           ArrayList<String> newList = new ArrayList<>(words);
           newList.remove(word);
           separateText(text, newList, result);
           break;
        }
    }

    return result.toString().trim();
}
import java.util.*;

public class Main {
    public static void main(String[] args) throws Exception {
        // You must sort this by it's length, or you will not have correct result
        // since it may cause match with more shorter words.
        // In this example, it's done
        List<String> words = Arrays.asList("hello", "how", "are", "you");
        List<String> detectedWords = new ArrayList<>();
        String text = "hellohowareyou";
        int i = 0;
        while (i < text.length()) {
            Optional<String> wordOpt = Optional.empty();

            for (String word : words) {
                if (text.indexOf(word, i) >= 0) {
                    wordOpt = Optional.of(word);
                    break;
                }
            }
            if (wordOpt.isPresent()) {
                String wordFound = wordOpt.get();
                i += wordFound.length();
                detectedWords.add(wordFound);
            }
        }
        String result = String.join(" ", detectedWords);
        System.out.println(result);
    }
}

我以为:

  • 你的文字永远不会是null
  • 您的文本匹配正则表达式^(hello|how|are|you)$
  • 你的话必须排序

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM