[英]Implementing Flajolet and Martin’s Algorithm in python
The following is the code which I've written to implement Flajolet and Martin's Algorithm
.以下是我为实现
Flajolet and Martin's Algorithm
编写的代码。 I've used Jenkins hash function
to generate a 32 bit hash value
of data.我使用
Jenkins hash function
生成了一个32 bit hash value
的数据32 bit hash value
。 The program seems to follow the algorithm but is off the mark by about 20%.该程序似乎遵循了算法,但偏离了大约 20%。 My data set consists of more than 200,000 unique records whereas the program outputs about 160,000 unique records.
我的数据集包含超过 200,000 条唯一记录,而程序输出大约 160,000 条唯一记录。 Please help me in understanding the mistake(s) being made by me.
请帮助我理解我所犯的错误。 The hash function is implemented as per Bob Jerkins' website .
哈希函数是按照Bob Jerkins 的网站实现的。
import numpy as np
from jenkinshash import jhash
class PCSA():
def __init__(self, nmap, maxlength):
self.nmap = nmap
self.maxlength = maxlength
self.bitmap = np.zeros((nmap, maxlength), dtype=np.int)
def count(self, data):
hashedValue = jhash(data)
indexAlpha = hashedValue % self.nmap
ix = hashedValue / self.nmap
ix = bin(ix)[2:][::-1]
indexBeta = ix.find("1") #find index of lsb
if self.bitmap[indexAlpha, indexBeta] == 0:
self.bitmap[indexAlpha, indexBeta] = 1
def getCardinality(self):
sumIx = 0
for row in range(self.nmap):
sumIx += np.where(self.bitmap[row, :] == 0)[0][0]
A = sumIx / self.nmap
cardinality = self.nmap * (2 ** A)/ MAGIC_CONST
return cardinality
If you are running this in Python2, then the division to calculate A may result in A being changed to an integer.如果您在 Python2 中运行它,那么计算 A 的除法可能会导致 A 更改为整数。
If this is the case, you could try changing:如果是这种情况,您可以尝试更改:
A = sumIx / self.nmap
to到
A = float(sumIx) / self.nmap
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.