简体   繁体   中英

Is usage analysis based on HyperLogLog compliant with GDPR?

Context: we have telemetry system for our service and would like to track retention, how many users use various features, etc.

There are two options to deal with user identifiable information and be GDPR compliant:

  1. Support deleting user information based on request
  2. Keep data for less than 30 days

Option #1 is hard to implement (for telemetry system). Option #2 doesn't allow answering questions such as "what is 6-month retention for feature X?".

One idea how to get answers for above question is to calculate HyperLogLog blobs per feature every week/day and store them separately forever. This will allow moving forward to merge/dcount/calculate retention based on these blobs.

Assuming that any user identifiable information is gone after 30 days (after user account gets deleted), will HyperLogLog blobs still allow to track users or not (ie to answer whether a particular user used feature X two years ago)?

If it allows then it is not compliant (doesn't mean that it is compliant if it doesn't allow).

In general HLLs are not GDPR compliant. This issue was somewhat addressed in a recent Google paper (see Section 8: 'Mitigation strategies').

The hash function used in HLL are usually not cryptographically secure (usually MurmurHash), hence even with salting you might still be able to answer the question "is a user part of a HLL data structure or not" and that's a no no.

Afaik you would be in compliance if you keep HLLs around for longer than 30 days iff you apply a salted crypto hash prior to HLL aggregation (ie a salted SHA-2 or BLAKE2b, BLAKE3) and you destroy the salt after each <30 day period. This would allow you to keep <30 day intervals. You would not be able to merge HLLs over several intervals but only over 28 day chunks, but that can still be super valuable dependent on your business needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM