在python中實現Flajolet和Martin算法

Question

以下是我為實現Flajolet and Martin's Algorithm編寫的代碼。 我使用Jenkins hash function生成了一個32 bit hash value的數據32 bit hash value 。 該程序似乎遵循了算法，但偏離了大約 20%。 我的數據集包含超過 200,000 條唯一記錄，而程序輸出大約 160,000 條唯一記錄。 請幫助我理解我所犯的錯誤。 哈希函數是按照Bob Jerkins 的網站實現的。

import numpy as np
from jenkinshash import jhash

class PCSA():
    def __init__(self, nmap, maxlength):
        self.nmap = nmap
        self.maxlength = maxlength
        self.bitmap = np.zeros((nmap, maxlength), dtype=np.int)

    def count(self, data):
        hashedValue = jhash(data)
        indexAlpha = hashedValue % self.nmap
        ix = hashedValue / self.nmap
        ix = bin(ix)[2:][::-1]       
        indexBeta = ix.find("1")    #find index of lsb
        if self.bitmap[indexAlpha, indexBeta] == 0:
            self.bitmap[indexAlpha, indexBeta] = 1


    def getCardinality(self):
        sumIx = 0
        for row in range(self.nmap):
            sumIx += np.where(self.bitmap[row, :] == 0)[0][0]

        A = sumIx / self.nmap

        cardinality = self.nmap * (2 ** A)/ MAGIC_CONST

        return cardinality

Answer 1

如果您在 Python2 中運行它，那么計算 A 的除法可能會導致 A 更改為整數。

如果是這種情況，您可以嘗試更改：

A = sumIx / self.nmap

到

A = float(sumIx) / self.nmap

在python中實現Flajolet和Martin算法

問題描述

1 個解決方案

解決方案1
1 2015-01-22 19:18:40

在python中實現Flajolet和Martin算法

問題描述

1 個解決方案

解決方案1 1 2015-01-22 19:18:40

解決方案1
1 2015-01-22 19:18:40