简体   繁体   English

在python中实现Flajolet和Martin算法

[英]Implementing Flajolet and Martin’s Algorithm in python

The following is the code which I've written to implement Flajolet and Martin's Algorithm .以下是我为实现Flajolet and Martin's Algorithm编写的代码。 I've used Jenkins hash function to generate a 32 bit hash value of data.我使用Jenkins hash function生成了一个32 bit hash value的数据32 bit hash value The program seems to follow the algorithm but is off the mark by about 20%.该程序似乎遵循了算法,但偏离了大约 20%。 My data set consists of more than 200,000 unique records whereas the program outputs about 160,000 unique records.我的数据集包含超过 200,000 条唯一记录,而程序输出大约 160,000 条唯一记录。 Please help me in understanding the mistake(s) being made by me.请帮助我理解我所犯的错误。 The hash function is implemented as per Bob Jerkins' website .哈希函数是按照Bob Jerkins 的网站实现的

import numpy as np
from jenkinshash import jhash

class PCSA():
    def __init__(self, nmap, maxlength):
        self.nmap = nmap
        self.maxlength = maxlength
        self.bitmap = np.zeros((nmap, maxlength), dtype=np.int)

    def count(self, data):
        hashedValue = jhash(data)
        indexAlpha = hashedValue % self.nmap
        ix = hashedValue / self.nmap
        ix = bin(ix)[2:][::-1]       
        indexBeta = ix.find("1")    #find index of lsb
        if self.bitmap[indexAlpha, indexBeta] == 0:
            self.bitmap[indexAlpha, indexBeta] = 1


    def getCardinality(self):
        sumIx = 0
        for row in range(self.nmap):
            sumIx += np.where(self.bitmap[row, :] == 0)[0][0]

        A = sumIx / self.nmap

        cardinality = self.nmap * (2 ** A)/ MAGIC_CONST

        return cardinality

If you are running this in Python2, then the division to calculate A may result in A being changed to an integer.如果您在 Python2 中运行它,那么计算 A 的除法可能会导致 A 更改为整数。

If this is the case, you could try changing:如果是这种情况,您可以尝试更改:

A = sumIx / self.nmap

to

A = float(sumIx) / self.nmap

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM