在python中实现Flajolet和Martin算法

Question

The following is the code which I've written to implement Flajolet and Martin's Algorithm .以下是我为实现Flajolet and Martin's Algorithm编写的代码。 I've used Jenkins hash function to generate a 32 bit hash value of data.我使用Jenkins hash function生成了一个32 bit hash value的数据32 bit hash value 。 The program seems to follow the algorithm but is off the mark by about 20%.该程序似乎遵循了算法，但偏离了大约 20%。 My data set consists of more than 200,000 unique records whereas the program outputs about 160,000 unique records.我的数据集包含超过 200,000 条唯一记录，而程序输出大约 160,000 条唯一记录。 Please help me in understanding the mistake(s) being made by me.请帮助我理解我所犯的错误。 The hash function is implemented as per Bob Jerkins' website .哈希函数是按照Bob Jerkins 的网站实现的。

import numpy as np
from jenkinshash import jhash

class PCSA():
    def __init__(self, nmap, maxlength):
        self.nmap = nmap
        self.maxlength = maxlength
        self.bitmap = np.zeros((nmap, maxlength), dtype=np.int)

    def count(self, data):
        hashedValue = jhash(data)
        indexAlpha = hashedValue % self.nmap
        ix = hashedValue / self.nmap
        ix = bin(ix)[2:][::-1]       
        indexBeta = ix.find("1")    #find index of lsb
        if self.bitmap[indexAlpha, indexBeta] == 0:
            self.bitmap[indexAlpha, indexBeta] = 1


    def getCardinality(self):
        sumIx = 0
        for row in range(self.nmap):
            sumIx += np.where(self.bitmap[row, :] == 0)[0][0]

        A = sumIx / self.nmap

        cardinality = self.nmap * (2 ** A)/ MAGIC_CONST

        return cardinality

Answer 1

If you are running this in Python2, then the division to calculate A may result in A being changed to an integer.如果您在 Python2 中运行它，那么计算 A 的除法可能会导致 A 更改为整数。

If this is the case, you could try changing:如果是这种情况，您可以尝试更改：

A = sumIx / self.nmap

to到

A = float(sumIx) / self.nmap

在python中实现Flajolet和Martin算法

问题描述

1 个解决方案

解决方案1
1 2015-01-22 19:18:40

在python中实现Flajolet和Martin算法

问题描述

1 个解决方案

解决方案1 1 2015-01-22 19:18:40

解决方案1
1 2015-01-22 19:18:40