I'm trying to solve problem Nth Ugly Number . I'm trying to use HashSet to avoid add duplicate ele to PriorityQueue. I'm expecting add() contains() in HashSet is O(1), which is better than PriorityQueue add() O(log(n)). However, I found my implementation is always worse than PriorityQueue only solution.
Then, I count conflit to see duplicate ratio. It's slightly over 10% constantly. So, as N growing, using HashSet should be better(10%*log(n)>>90%*C for large n). Weird thing is as N growing, using HashSet becomes even worse. From almost same performance when n=1000,10000,100000 to 3 times worse in 1,000,000 and 4 times in 10,000,000. I've read ( Fastest Java HashSet<Integer> library ) saying 1.5n initial capacity. So, HashSet usually has 2.5~3n elements. I'm setting 4n or 5n to my HashSet. It doesn't help.
Does any know Why it happens?
public class Test {
int conflict = 0;
public static void main(String[] args) {
Test test = new Test();
long start = System.currentTimeMillis();
int N = 10000000;
test.nthUglyNumber(N);
long end = System.currentTimeMillis();
System.out.println("Time:" + (end - start));
start = System.currentTimeMillis();
test.nthUglyNumber2(N);
end = System.currentTimeMillis();
System.out.println("Time:" + (end - start));
}
public int nthUglyNumber(int n) {
if (n <= 0) {
return 0;
}
HashSet<Integer> CLOSED = new HashSet<Integer>(5 * n);
PriorityQueue<Integer> OPEN = new PriorityQueue<Integer>();
int cur = 1;
OPEN.add(cur);
CLOSED.add(cur);
while (n > 1) {
n--;
cur = OPEN.poll();
int cur2 = cur * 2;
if (CLOSED.add(cur2)) {
OPEN.add(cur2);
}
// else {
// conflict++;
// }
int cur3 = cur * 3;
if (CLOSED.add(cur3)) {
OPEN.add(cur3);
}
// else{
// conflict++;
// }
int cur5 = cur * 5;
if (CLOSED.add(cur5)) {
OPEN.add(cur5);
}
// else{
// conflict++;
// }
}
return OPEN.peek();
}
public int nthUglyNumber2(int n) {
if (n == 1)
return 1;
PriorityQueue<Long> q = new PriorityQueue();
q.add(1l);
for (long i = 1; i < n; i++) {
long tmp = q.poll();
while (!q.isEmpty() && q.peek() == tmp)
tmp = q.poll();
q.add(tmp * 2);
q.add(tmp * 3);
q.add(tmp * 5);
}
return q.poll().intValue();
}
}
I don't think that your analysis is taking account of memory management overheads. Each time the GC runs it is going to need to trace and move some or all of the reachable objects in the HashSet
. While it is difficult to quantify this in the average case, in the worst case (a full GC) the extra work is O(N)
.
There could also be secondary memory effects; eg the version with the HashSet
will have a larger working set, which will lead to more memory cache misses. This will be most pronounced during garbage collection.
I suggest that you profile the two versions of the code to determine where the extra time is really being consumed.
If you are looking for ways to make the cache perform better:
Bitset
or a 3rd-party library. LinkedHashSet
and dropping entries once they have passed the window in which cache hits are possible. Note that when there's no conflict (90% of the cases), you call add
twice: one on the HashSet
, and one on the PriorityQueue
; while the PrioertyQueue
-only solution only calls add
once.
Therefore, your HashSet
adds an overhead in 90% of the cases, while speeding up only 10% of them.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.