简体   繁体   English

如何在二叉搜索树中打印出第n个最常用的单词?

[英]How to print out the nth most frequent words in a binary search tree?

This concerns "a software algorithm" https://stackoverflow.com/help/on-topic 这涉及“软件算法” https://stackoverflow.com/help/on-topic

I am currently writing a word counter dictionary program. 我目前正在写一个单词计数字典程序。 To store the different word counts, I am using a Binary Search Three with the word as the key and the frequency as the value. 为了存储不同的字数,我使用二进制搜索三,以字为键,频率为值。

Here is my Binary Search Tree class 这是我的二进制搜索树类

public class BinarySearchTree<AnyKey extends Comparable<? super AnyKey>, AnyValue>
    implements MyTreeMap<AnyKey, AnyValue>{
              protected BinaryNode<AnyKey, AnyValue> root;
              protected BinaryNode<AnyKey, AnyValue> insert(AnyKey x, 
                      AnyValue y, BinaryNode<AnyKey, AnyValue> t ){
                    if( t == null )
                         t = new BinaryNode<AnyKey, AnyValue>(x, y );
                   else if( x.compareTo( t.element ) < 0 )
                         t.left = insert( x, y, t.left );
                  else if( x.compareTo( t.element ) > 0 )
                        t.right = insert( x, y, t.right );
                 else
                          throw new IllegalArgumentException( x.toString( ) );  
                return t;
      }

And here's my node class 这是我的节点类

class BinaryNode<AnyKey, AnyValue> {
      BinaryNode( AnyKey theElement, AnyValue theValue ){
          element = theElement;
          value = theValue;
          left = right = null;
       }
       AnyKey             element; 
       AnyValue    value;
        BinaryNode<AnyKey, AnyValue> left;    
       BinaryNode<AnyKey, AnyValue> right;  
     }

I am trying to write this method inside my Binary Search Tree 我想在我的二进制搜索树中编写这个方法

@Override
public void PrintMostFrequent(int n) {

}

Where it will print out the nth most frequent words based on frequency. 它将根据频率打印出第n个最常用的单词。 I have an idea for how to do this in psuedo code. 我知道如何在伪代码中执行此操作。
1. Create a collection to hold nodes 1.创建一个用于保存节点的集合
2. Add all the nodes from the tree to this collection 2.将树中的所有节点添加到此集合中
3. Sort the collection based on counts 3.根据计数对集合进行排序
4. Iterate sorted collection and print out the nth most frequent. 4.迭代排序的集合并打印出最常见的第n个。

Is this the best way to solve this problem/write this method? 这是解决此问题的最佳方法/编写此方法吗? I was afraid that creating a separate collection might be spaciously too expensive and the sorting would be computationally expensive as well. 我担心创建一个单独的集合可能是非常昂贵的,并且排序在计算上也是昂贵的。

Your Method describe is also pretty much good . 你的方法描述也非常好。 It will be complex when you consider need to added one insert new word into the there one fro inserting into the tree which will take O(logn) and on the sorted list O(n) in worstcase Then for searching again O(n). 当你考虑需要在插入到树中的一个插入新单词时将是复杂的,这将采用O(logn)并且在最坏情况下在排序列表O(n)上然后再次搜索O(n)。

For better performance over searching of for nth frequent node and inserting one method would be create one more BST but with frequency . 为了比搜索第n个频繁节点更好的性能,插入一个方法将创建一个BST但具有频率。 So for inserting a new node in both tree will take O(logn) and for searching O(logn) . 因此,在两个树中插入新节点将采用O(logn)并搜索O(logn)。

In the above method you have redundancy for data ie 2nd tree will have word and frequency both . 在上面的方法中,您有数据冗余,即第二棵树将具有单词和频率。 So for avoiding that what you can do is in 2nd BST just put frequency and one reference to node of the word in the 1st BST with this you can jump from one tree to another tree any point of time. 因此,为了避免你可以做的是在第二个BST中,只需将频率和一个引用放在第一个BST中的单词的节点,这样你就可以在任何时间点从一棵树跳到另一棵树。

A solution would be: 解决方案是:

  1. Initialize a TreeSet<Node> result sorted by node word frequency. 初始化按节点字频率排序的TreeSet<Node> result
  2. Add the first n elements from your tree to the set. 将树中的前n个元素添加到集合中。
  3. Iterate through the rest of the elements, replacing the lowest value in the set with higher values. 迭代其余元素,用更高的值替换集合中的最低值。 if current > result.lowest() then result.pollFirst(); result.add(current)

This has limited spacial cost and should be faster, as most elements can be skipped directly. 这有限的空间成本,应该更快,因为大多数元素可以直接跳过。

Note however, that unless you are dealing with huge arrays and have traced slowdowns to this function, your solution's simplicity makes it the better choice. 但请注意,除非您正在处理大型数组并且已经跟踪此功能的减速,否则您的解决方案的简单性使其成为更好的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM