简体   繁体   English

如何通过在相邻字符之间添加空格将一个单词分为两个单词

[英]how to split a word into two words by adding a space between adjacent characters

I am trying to take the word: missspelling and split the word into two words by adding a " " (space) between adjacent chars and want to get the word: miss spelling as a result. 我正在尝试使用单词: 拼写错误,并通过在相邻字符之间添加“”(空格)将单词拆分为两个单词,并希望获得单词:结果是拼写错误 Any guidance would help, been trying out different code, but have not seen results. 尝试使用不同的代码,但没有看到任何指导,将有所帮助。

Code that works for other suggestions for reference only. 适用于其他建议的代码仅供参考。 *Note that commented out code is what I have been messing with to try and get the correct result. *请注意,注释掉的代码是我一直在试图获得正确结果的原因。

    /**
     * Returns possible suggestions for misspelled word
     * 
     * @param tree The Trie that will be checked
     * @param word The word in trie that is checked
     */
    public static void suggest(TrieNode tree, String word) {
        Set<String> result = new HashSet<>();
        System.out.println("Suggestions: ");
        // Remove a character
        for (int i = 0; i < word.length(); ++i)
            result.add(word.substring(0, i) + word.substring(i + 1));
        // Swap two consecutive characters
        for (int i = 0; i < word.length() - 1; ++i)
            result.add(word.substring(0, i) + word.substring(i + 1, i + 2) + word.substring(i, i + 1)
                    + word.substring(i + 2));
        // Replace a character with other
        for (int i = 0; i < word.length(); ++i)
            for (char c = 'a'; c <= 'z'; ++c)
                result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i + 1));
        // Add a new character
        for (int i = 0; i <= word.length(); ++i)
            for (char c = 'a'; c <= 'z'; ++c)
                result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i));
        // Split word into pair of words by adding a " " between adjacent pairs
        // Need help here
        for (int i = 0; i < word.length(); ++i)
            for (char c = ' '; c <= ' '; ++c)
                if (search(tree, word.substring(0, i)) && search(tree, word.substring(i)) == true)
                     result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i));


        ArrayList<String> res = new ArrayList<>(result);
        int j = 0;
        for (int i = 0; i < result.size(); i++)
            if (search(tree, res.get(i))) {
                if (j == 0)
                    System.out.print("[");
                System.out.print(res.get(i) + ",");
                System.out.print("");
                j++;
            }
         System.out.print("]" + "\n");
    }

I wrote a minimal, runnable piece of code that splits words if the two word pieces are found in the dictionary. 我编写了一个最小的,可运行的代码段,如果在词典中找到了两个单词段,则会拆分单词。

Here are my test results 这是我的测试结果

miss spelling
apple

And here's the code. 这是代码。 The important method is the splitWord method. 重要的方法是splitWord方法。

package com.ggl.testing;

import java.util.ArrayList;
import java.util.List;

public class DoubleWord implements Runnable {

    public static void main(String[] args) {
        new DoubleWord().run();
    }

    @Override
    public void run() {
        Dictionary dictionary = new Dictionary();
        System.out.println(splitWord("missspelling", dictionary));
        System.out.println(splitWord("apple", dictionary));
    }

    public String splitWord(String word, Dictionary dictionary) {
        for (int index = 1; index < word.length(); index++) {
            String prefix = word.substring(0, index);
            if (dictionary.isWordInDictionary(prefix)) {
                String suffix = word.substring(index);
                if (dictionary.isWordInDictionary(suffix)) {
                    return prefix + " " + suffix;
                }
            }
        }

        return word;
    }

    public class Dictionary {
        private List<String> words;

        public Dictionary() {
            this.words = setWords();
        }

        public boolean isWordInDictionary(String word) {
            return words.contains(word);
        }

        private List<String> setWords() {
            List<String> words = new ArrayList<>();
            words.add("apple");
            words.add("miss");
            words.add("spelling");
            words.add("zebra");

            return words;
        }
    }

}

A couple of things first... 首先要注意几件事...

This line is insane: 这行是疯狂的:

for (char c = ' '; c <= ' '; ++c)

It will iterate exactly once and is equivalent to: 它将仅迭代一次,等效于:

char c = ' ';

You are reinventing the wheel by attempting to find valid words by swapping chars and then by substituting chars: Read about Levenshtein distance , implement that algorithm, then sort your dictionary by the Levenshtein distance from your input to find "best matches", which should be filtered by a maximum Levenshtein distance - perhaps 3 would be a good starting point (test your code and see if the result is reasonable). 您正在尝试通过交换字符然后替换字符来查找有效的单词,从而重新发明轮子:阅读有关Levenshtein距离的信息 ,实现该算法,然后按距您输入的Levenshtein距离对字典进行排序,以找到“最佳匹配项”,通过最大Levenshtein距离过滤-也许3是一个很好的起点(测试您的代码,看看结果是否合理)。


Your TrieNode should have a search() method, rather than your search() method accepting a trie and a word, but that's more a matter of design and isn't your biggest problem. 您的TrieNode应该有一个search()方法,而不是您的search()方法接受一个trie和一个单词,但这更多的是设计问题,而不是您最大的问题。


Now then, regarding your actual question, attempting to split the input is complicated, but the "answer" is: 现在,关于您的实际问题,尝试拆分输入很复杂,但是“答案”是:

Loop through all positions in the input between letters, and put each "half" through the same process as your input, except that you should not do a nested split, combining every combination of suggestions for each half, then return a collection of all unique suggestion combniations . 在输入之间的字母之间循环浏览所有位置,并通过与输入相同的过程来处理每个“半”,不同之处在于您不应该进行嵌套拆分,将每个建议的每个组合组合在一起,然后返回所有唯一的集合建议组合

However, doing this this will result is a "very large" number of suggestions and thus will not scale, so you probably shouldn't do it. 但是,这样做将导致大量建议,因此无法扩展,因此您可能不应该这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM