简体   繁体   中英

Data Structure to cache most frequent elements

Suppose I read a stream of integers. The same integer may appear more than once in the stream. Now I would like to keep a cache of N integers that appeared most frequently. The cache is sorted by the frequency of the stream elements.

How would you implement it in Java?

You want to use a binary indexed tree, the code in the link is for C++ and should be fairly straightforward to convert into Java (AFAICT the code would be the same):

Paper Peter Fenwick

Implementation in C++

public class MyData implements Comparable<MyData>{
  public int frequency = 0;
  public Integer data;
  @Override
  public int compareTo(MyData that) {
    return  this.frequency - that.frequency;
  }

}

Have it stored in a PriorityQueue

Create an object model for the int, inside create a Count property. Create a SortedVector collection extending the Vector collection. Each time an integer occurs, add it to the vector if it doesn't exist. Else, find it, update the count property += 1, then call Collections.sort(this) within your Vector.

Do you know the range of the numbers? If so, it might make sense to use an array. For example, if I knew that the range of the numbers was between 0 and 10, I would make an array of size 10. Each element in this array would count the number of times I've seen a given number. Then, you just have to remember the most frequently seen number.

eg

array[10];
freq_index = -1;
freq_count = -1;

readVal(int n){
  array[n]+=1;
  if array[n] > freq_count
    freq_index = n;
    freq_count = array[n];
}

Of course, this approach is bad if the distribution of numbers is sparse.

I'd try a priority queue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM