简体繁体中英

Bloom filter to remove duplicates from a stream of integers in O(n)

原文 2013-10-09 06:44:09 8 2 java/ algorithm

How to create a bloom filter to remove the duplicate elements from a stream of integers in O(n) time complexity & O(1) space complexity ? If possible, i would appreciate if some one can point me in right direction ?

2 answers

I'm fairly certain it's just:

For each element:

Check if it exists in the bloom filter, if it does, it's likely a duplicate
Insert it into the bloom filter

Now there are two problems with this:

There is a probability of false positives
It's not truly O(1) space (but some people may say it is) as the size needs to be somewhat dependent on the number of (unique) elements, otherwise, the error rate will increase significantly as we increase the number of elements.

I don't believe either of these problems can be avoided given the constraints - both are integral parts of using (only) bloom filters.

If we weren't dealing with a stream, but rather a list, we could get rid of the false positives by recording all the elements picked up by the bloom filter and go through the list again checking against our candidate list instead to make sure they're actual duplicates. This is still O(n) time, but obviously not O(1) space.

Google Guava offers a bloom filter implementation.

Note that bloom filter is not enough by itself. If bloom filter claims that a number is not in it, then it's not in it. But if it claims that the number is already in it, there's a chance that it's wrong. So you need to have another datastructure there to be sure and use bloomfilter to reduce the number of lookups in that datastructure.

How to modify my method to search and then remove duplicates in O(N) or O(N * log N)?

How to remove only one max (min) from a list using Java 8 stream API in O(N) time and O(C) space complexity

Remove Duplicates With Stream Distinct

O(n) algorithm to find the odd-number-out in array of consecutive integers from 1 to n(not odd numbers)

filter/remove invalid xml characters from stream

find duplicates in a sorted linkedlist in O(n) time

Find a sum pair in array with duplicates in O(n)

Duplicates in array - possible to solve in o(n)?

Stream a list of n-integers from a file to create n object array

integrate Choco and Bloom filter

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to modify my method to search and then remove duplicates in O(N) or O(N * log N)? How to remove only one max (min) from a list using Java 8 stream API in O(N) time and O(C) space complexity Remove Duplicates With Stream Distinct O(n) algorithm to find the odd-number-out in array of consecutive integers from 1 to n(not odd numbers) filter/remove invalid xml characters from stream find duplicates in a sorted linkedlist in O(n) time Find a sum pair in array with duplicates in O(n) Duplicates in array - possible to solve in o(n)? Stream a list of n-integers from a file to create n object array integrate Choco and Bloom filter

Related Tags

Bloom filter to remove duplicates from a stream of integers in O(n)

Question

2 answers

solution1
4 2013-10-09 08:39:30

solution2
1 2013-10-09 08:47:31

Bloom filter to remove duplicates from a stream of integers in O(n)

Question

2 answers

solution1 4 2013-10-09 08:39:30

solution2 1 2013-10-09 08:47:31

solution1
4 2013-10-09 08:39:30

solution2
1 2013-10-09 08:47:31