简体   繁体   English

使用用户输入的字符串找到可以制作的最长单词

[英]Using a user inputted string of characters find the longest word that can be made

Basically I want to create a program which simulates the 'Countdown' game on Channel 4. In effect a user must input 9 letters and the program will search for the largest word in the dictionary that can be made from these letters.I think a tree structure would be better to go with rather than hash tables. 基本上我想创建一个模拟Channel 4上'Countdown'游戏的程序。实际上用户必须输入9个字母,程序将搜索字典中可以用这些字母制作的最大单词。我想一棵树结构会更好,而不是哈希表。 I already have a file which contains the words in the dictionary and will be using file io. 我已经有一个文件,其中包含字典中的单词,并将使用文件io。

This is my file io class: 这是我的文件io类:

public static void main(String[] args){
     FileIO reader = new FileIO();
     String[] contents = reader.load("dictionary.txt");
}

This is what I have so far in my Countdown class 这是我在Countdown课程中到目前为止所做的

public static void main(String[] args) throws IOException{
     Scanner scan = new Scanner(System.in);
     letters = scan.NextLine();
}

I get totally lost from here. 我从这里完全迷失了。 I know this is only the start but I'm not looking for answers. 我知道这只是一个开始,但我不是在寻找答案。 I just want a small bit of help and maybe a pointer in the right direction. 我只是想要一点点帮助,也许是一个指向正确方向的指针。 I'm only new to java and found this question in an interview book and thought I should give it a . 我只是java的新手,在一本采访书中发现了这个问题,并认为我应该给它一个。

Thanks in advance 提前致谢

welcome to the world of Java :) 欢迎来到Java世界:)

The first thing I see there that you have two main methods, you don't actually need that. 我在那里看到的第一件事就是你有两个主要的方法,你实际上并不需要它。 Your program will have a single entry point in most cases then it does all its logic and handles user input and everything. 在大多数情况下,您的程序将只有一个入口点,然后它会完成所有逻辑并处理用户输入和所有内容。

You're thinking of a tree structure which is good, though there might be a better idea to store this. 您正在考虑一个很好的树结构,尽管可能有更好的想法存储它。 Try this: http://en.wikipedia.org/wiki/Trie 试试这个: http//en.wikipedia.org/wiki/Trie

What your program has to do is read all the words from the file line by line, and in this process build your data structure, the tree. 您的程序要做的是逐行读取文件中的所有单词,并在此过程中构建数据结构树。 When that's done you can ask the user for input and after the input is entered you can search the tree. 完成后,您可以要求用户输入,输入输入后,您可以搜索树。

Since you asked specifically not to provide answers I won't put code here, but feel free to ask if you're unclear about something 既然你特意要求不提供答案,我不会在这里放置代码,但随时可以询问你是否不清楚某些事情

There are only about 800,000 words in the English language, so an efficient solution would be to store those 800,000 words as 800,000 arrays of 26 1-byte integers that count how many times each letter is used in the word, and then for an input 9 characters you convert to similar 26 integer count format for the query, and then a word can be formed from the query letters if the query vector is greater than or equal to the word-vector component-wise. 英语中只有大约800,000个单词,因此一个有效的解决方案是将这800,000个单词存储为800个26个1字节整数数组,计算每个字母在单词中的使用次数,然后输入9您为查询转换为类似的26整数计数格式的字符,然后如果查询向量大于或等于单词向量分量,则可以从查询字母形成单词。 You could easily process on the order of 100 queries per second this way. 您可以通过这种方式轻松处理每秒100个查询的顺序。

I would write a program that starts with all the two-letter words, then does the three-letter words, the four-letter words and so on. 我会编写一个以所有双字母单词开头的程序,然后写三个字母的单词,四个字母的单词等等。

When you do the two-letter words, you'll want some way of picking the first letter, then picking the second letter from what remains. 当你做两个字母的单词时,你会想要一些方法来挑选第一个字母,然后从剩下的字母中挑选第二个字母。 You'll probably want to use recursion for this part. 您可能希望对此部分使用递归。 Lastly, you'll check it against the dictionary. 最后,你将根据字典检查它。 Try to write it in a way that means you can re-use the same code for the three-letter words. 尝试以一种方式编写它,这意味着您可以为三个字母的单词重复使用相同的代码。

I believe, the power of Regular Expressions would come in handy in your case: 我相信, 正则表达式的强大功能在您的情况下会派上用场:

1) Create a regular expression string with a symbol class like: /^[abcdefghi]*$/ with your letters inside instead of "abcdefghi". 1)创建一个带有符号类的正则表达式字符串,如:/ ^ [abcdefghi] * $ /用你的字母代替“abcdefghi”。

2) Use that regular expression as a filter to get a strings array from your text file. 2)使用该正则表达式作为过滤器从文本文件中获取字符串数组。

3) Sort it by length. 3)按长度排序。 The longest word is what you need! 最长的词就是你需要的!

Check the Regular Expressions Reference for more information. 有关详细信息,请查看正则表达式参考

UPD: Here is a good Java Regex Tutorial . UPD:这是一个很好的Java Regex教程

A first approach could be using a tree with all the letters present in the wordlist. 第一种方法可以是使用具有词列表中存在的所有字母的树。

If one node is the end of a word, then is marked as an end-of-word node. 如果一个节点是单词的结尾,则将其标记为单词结尾节点。

树

In the picture above, the longest word is banana . 在上图中,最长的单词是banana But there are other words, like ball , ban , or banal . 但还有其他的话,比如禁令平庸

So, a node must have: 因此,节点必须具有:

  1. A character 一个人物
  2. If it is the end of a word 如果它是一个单词的结尾
  3. A list of children. 一份儿童名单。 (max 26) (最多26)

The insertion algorithm is very simple: In each step we "cut" the first character of the word until the word has no more characters. 插入算法非常简单:在每个步骤中,我们“剪切”单词的第一个字符,直到单词不再有字符。

public class TreeNode {

    public char c;
    private boolean isEndOfWord = false;
    private TreeNode[] children = new TreeNode[26];

    public TreeNode(char c) {
        this.c = c;
    }

    public void put(String s) {
        if (s.isEmpty())
        {
            this.isEndOfWord = true;
            return;
        }
        char first = s.charAt(0);
        int pos = position(first);
        if (this.children[pos] == null)
            this.children[pos] = new TreeNode(first);

        this.children[pos].put(s.substring(1));
    }

    public String search(char[] letters) {
        String word = "";
        String w = "";

        for (int i = 0; i < letters.length; i++)
        {
            TreeNode child = children[position(letters[i])];
            if (child != null)
                w = child.search(letters);
               //this is not efficient. It should be optimized.
            if (w.contains("%")
                    && w.substring(0, w.lastIndexOf("%")).length() > word
                            .length())
                word = w;
        }
            // if a node its end-of-word we add the special char '%'
        return c + (this.isEndOfWord ? "%" : "") + word;
    }
    //if 'a' returns 0, if 'b' returns 1...etc
    public static int position(char c) {
        return ((byte) c) - 97;
    }


}

Example: 例:

public static void main(String[] args) {
    //root
    TreeNode t = new TreeNode('R');
    //for skipping words with "'" in the wordlist
    Pattern p = Pattern.compile(".*\\W+.*");
    int nw = 0;
    try (BufferedReader br = new BufferedReader(new FileReader(
            "files/wordsEn.txt")))
    {
        for (String line; (line = br.readLine()) != null;)
        {
            if (p.matcher(line).find())
                continue;
            t.put(line);
            nw++;
        }
        // line is not visible here.
        br.close();
        System.out.println("number of words : " + nw);
        String res = null;
        // substring (1) because of the root
        res = t.search("vuetsrcanoli".toCharArray()).substring(1);
        System.out.println(res.replace("%", ""));
    }

    catch (Exception e)
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

}

Output: 输出:

number of words : 109563
counterrevolutionaries

Notes: 笔记:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM