简体繁体中英

Set-based hash (digest) algorithm?

原文 2013-07-20 00:51:24 8 3 algorithm/ hash/ set

Is there any message digest algorithm that you can apply set functions on the digest and the result still makes sense? In other words, is there a hash function that does NOT break the concept of "set" before and after hashing?

I'm looking for a hash function that:

hashes a set of data into a fixed-length (or bounded-length) string
produces identical hash if the input data set is the same
if you select a subset of your raw data, it is equivalent to either hash the data subset, or apply the subset to the hash of the original data set, ie you will get the same subset hash in the both ways.

As an example, in the following picture set A has several data points (red dimonds). B is a subset of A. Is there such a hash function that:

data in A ---- hash function ----> _hashA ---- set operation ----> _hashB

data in B ---- hash function ----> _hashB

在此处输入图片说明

3 answers

This looks a bit like http://en.wikipedia.org/wiki/Homomorphic_encryption and a bit like database privacy schemes like http://en.wikipedia.org/wiki/Differential_privacy - at least to me.

In both cases developers have had problems because it turned out that once you let users do a few things they could find clever ways to work out how to do anything they wanted using those few things as building blocks so the system lacked any security at all.

In your case I think you want AndHash(hash(a), hash(b)) = hash(a and b). This means that if hash(a) != hash(null set) then I can find out if a is a member of any set based on the hash value of that set. If this happens a lot I can work out many of the members of a hashed set given its hash value, which means that the hash value must be pretty much as big as the set, as it contains all the information in it.

Depending on what you want this for, it might be worth looking at http://en.wikipedia.org/wiki/Minhash .

AFAIK, no. Hash functions generally (and I've seen many) operate on a single chunk of data without any regard whatsoever for what that data may actually represent, the primary concern being to reduce to probability of collisions. That said, it's certainly possible to come up with something like what you're wanting to do, but I imagine it would be exceedingly difficult, and the result most likely suboptimal in terms of collision-avoidance.

The short answer is no, there isn't such an algorithm. What you might try is encrypting your data and then decrypting it when you need to apply your set function, then encrypting it again. Hashing algorithms, however, are by their very nature one way and involve the loss of data. There's a good explanation of the difference between hash and encryption algorithms here: Fundamental difference between Hashing and Encryption algorithms

How to construct a set-based query for a subset?

Can I identify a hash algorithm based on the initial key and output hash?

upgradable digest / checksum algorithm

Python digest/hash for string similarity

Efficient Matching Algorithm for Set Based Triplets

Frequent item set based on Apriori Algorithm and item based recommendation

Best algorithm to hash a string

Hash and reduce to bucket algorithm

Hash algorithm for small files

Hash algorithm shuffling?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to construct a set-based query for a subset? Can I identify a hash algorithm based on the initial key and output hash? upgradable digest / checksum algorithm Python digest/hash for string similarity Efficient Matching Algorithm for Set Based Triplets Frequent item set based on Apriori Algorithm and item based recommendation Best algorithm to hash a string Hash and reduce to bucket algorithm Hash algorithm for small files Hash algorithm shuffling?

Related Tags

Set-based hash (digest) algorithm?

Question

3 answers

solution1
1 2013-07-20 04:41:46

solution2
0 2013-07-20 01:06:38

solution3
0 2013-07-20 01:07:47

Set-based hash (digest) algorithm?

Question

3 answers

solution1 1 2013-07-20 04:41:46

solution2 0 2013-07-20 01:06:38

solution3 0 2013-07-20 01:07:47

solution1
1 2013-07-20 04:41:46

solution2
0 2013-07-20 01:06:38

solution3
0 2013-07-20 01:07:47