简体   繁体   中英

Best Data Structure for fast retrieval, update, and keeping ordering

The problem is as follows

  • I need to keep track of url + click count.
  • I need to be able to update url quickly with click count when user click on that url.
  • I need to be able to retrieve the top 10 click count URL quickly.

NOTE: Assuming you cannot use the database.

What is the best data structure to achieve the result?

I have thought about using a map before, but map doesn't keep track of ordering of the top 10 clicks.

You need an additional List<Map.Entry<URL,Integer>> for holding the top ten, with T being the click count for the lowermost.

  • If you count another click and this count is still not greater than T: do nothing.
  • If the increased count is greater than T, check whether the URL is in the list or not. If it is, do nothing. If it is not, add this entry to the List, sort and delete the last entry if the list has more than 10 entries. Update T.

The best data structure I can think is of using the TreeSet.

The elements of TreeSet are sorted, so you can easily find top items. Also make sure for URL you maintain a separate comparator class which implements Comparator, so you can put your logic of keeping elements sorted all the time based on count. Use this comparator while creating the TreeSet. Insertion/Update/delete/Get all operations happen in O(logn)

Here is the code, how you should define the structure.

 TreeSet<URL> treeSet = new TreeSet<URL>(new URLComparator());


class URL {
    private String url;
    int count;

    public URL(String string, int i) {
        url = string;
        count = i;
    }

    @Override
    public int hashCode() {
        return url.hashCode();
    }

    @Override // No need to write this method. Just used it for testing 
    public String toString() {
        return "url : " + url + " ,count : " + count+"\n";
    }

}

One more info- Use hashcode method of your URL class as hashcode of your url.

This is how you define URLComparator class. compare logic is based on URL count.

class URLComparator implements Comparator<URL> {

    @Override
    public int compare(URL o1, URL o2) {
        return new Integer(o2.count).compareTo(o1.count);
    }

}

Testing

TreeSet<URL> treeSet = new TreeSet<URL>(new URLComparator());

treeSet.add(new URL("url1", 12));
treeSet.add(new URL("url2", 0));
treeSet.add(new URL("url3", 5));

System.out.println(treeSet);

Output:-

[url : url1 ,count : 12
, url : url3 ,count : 5
, url : url2 ,count : 0
]

To print top 10 elements, use following code.

Iterator<URL> iterator = treeSet.iterator();
int count = 0;
while(count < 10 && iterator.hasNext() ){
    System.out.println(iterator.next());
    count++;
}

You can use a Map<String, Integer> for the use case as:

  1. It keeps track of key (url) and value (click count)

  2. You can put to the map an updated url with mapped click count when user click on that url.

  3. You can retrieve the top 10 click count after sorting the map based on the entryset

     // create a list out of the entryset of your map Set<Map.Entry<String, Integer>> set = map.entrySet(); List<Map.Entry<String, Integer>> list = new ArrayList<>(set); // this can be clubbed in another stub to act on top 'N' click counts list.sort((o1, o2) -> (o2.getValue()).compareTo(o1.getValue())); list.stream().limit(10).forEach(entry -> System.out.println(entry.getKey() + " ==== " + entry.getValue())); 

Using Map, you will have to sort the values for top 10 urls. which will egt you o(nlogn) complexity using comparator for sorting by values.

Another Way is:

Using Doubly linked list(of size 10) with a HashMap (And proceeding in a LRU cache way) Retrieve/Update will be o(1). Top 10 results will be items in list.

Structure of Doubly list :

class UrlAndCountNode{
    String url;
    int count;
    UrlAndCountNode next;
    UrlAndCountNode prev;   
}

Structure of Map:

Map<String, UrlAndCountNode>

That's an interesting question IMO. It seems you need something that is sorted by clicks, but at the same time you need to alter these values, the only way to do that with a data structure is to remove that entry (that you want to update) and put the updated one back. Simply updating clicks will not work. As such I think that keeping them sorted by clicks is a batter option.

The downside is that if there are entries with the same number of clicks, they will get overriden, as such something like guava multiset would be a much better option.

As such I would do this:

static class Holder {
    private final String name;

    private final int clicks;

    public Holder(String name, int clicks) {
        super();
        this.name = name;
        this.clicks = clicks;
    }

    public String getName() {
        return name;
    }

    public int getClicks() {
        return clicks;
    }

    @Override
    public String toString() {
        return "name = " + name + " clicks = " + clicks;
    }
}

And methods would look like this:

private static List<Holder> firstN(Multiset<Holder> set, int n) {
    return set.stream().limit(n).collect(Collectors.toList());
}

private static void updateOne(Multiset<Holder> set, String urlName, int more) {
    Iterator<Holder> iter = set.iterator();

    int currentClicks = 0;
    boolean found = false;

    while (iter.hasNext()) {
        Holder h = iter.next();
        if (h.getName().equals(urlName)) {
            currentClicks = h.getClicks();
            iter.remove();
            found = true;
        }
    }

    if (found) {
        set.add(new Holder(urlName, currentClicks + more));
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM